Skip to main content

SQL query to extract file names from a html content


In this article I am going to explain how to extract the filename from html column using sql function.

Below is the function which i used to do this operation



CREATE FUNCTION dbo.fn_getFilenames(@InputHTML NVARCHAR(MAX))
RETURNS @res TABLE (pdf NVARCHAR(MAX)) AS
BEGIN
-- assumes there are no single quotes or double quotes in the PDF filename
DECLARE @i INT, @j INT, @k INT, @tmp NVARCHAR(MAX);
SET @i = CHARINDEX(N'.pdf', @InputHTML);
WHILE @i > 0
BEGIN
  SELECT @tmp = left(@InputHTML, @i+3);
  SELECT @j = CHARINDEX('/', REVERSE(@tmp)); -- directory delimiter
  SELECT @k = CHARINDEX('"', REVERSE(@tmp)); -- start of href
  IF @j = 0 or (@k > 0 and @k < @j) SET @j = @k;
  SELECT @k = CHARINDEX('''', REVERSE(@tmp)); -- start of href (single-quote*)
  IF @j = 0 or (@k > 0 AND @k < @j) SET @j = @k;
  INSERT @res VALUES (SUBSTRING(@tmp, len(@tmp)-@j+2, len(@TMP)));
  SELECT @InputHTML = STUFF(@InputHTML, 1, @i+4, ''); -- remove up to ".pdf"
  SET @i = CHARINDEX(N'.pdf', @InputHTML);
END
RETURN
END
GO

Below is the query to check the output

-- CREATE TABLE


create table mytable (Html varchar(max));

-- INSERT HTML Content

insert into mytable values('
<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/uploadedFiles/uploadedFiles/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>')

insert into mytable values('
<p>A deferred tuition payment plan,
or view the <a href="Two files here-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>And I use single quotes
   <a href=''/look/path/The second file.pdf''
target="_blank">list</a>');


--SELECT Statement

select t.*, p.pdf


from mytable t
cross apply dbo.extract_filenames_from_a_tags(html) p;

OUTPUT Will be:

SQL query to extract file names from a html content




Comments

Popular posts from this blog

Sort Dictionary Based On Value In Asp.Net And C#.Net | Convert Dictionary into KeyValuePair or KeyValuePair into Dictionary.

In this tutorial i am going to explain about how to sort dictionary object based on value in asp.net and C#.Net or convert unsorted dictionary to sorted dictionary object in C#.Net and VB.Net or Convert Dictionary into KeyValuePair or KeyValuePair into Dictionary.

Geckofx Browser in Winform Application

Bored with IE browser in your winform application ? Want to do everything as you doing in your Firefox or Chrome Browser ? Play with automation ? Then here is your choice . Introduction:  GeckoFX is a Windows Forms control written in clean, commented C# that embeds the Mozilla Gecko browser control in any Windows Forms Application. It also contains a simple class model providing access to the HTML and CSS DOM . GeckoFX was originally created by Andrew Young for the fast-growing visual CSS editor, Stylizer . It is now released as open-source under the Mozilla Public License.  You can download it here :   Geckofx 22.0 And the supporting file Xulrunner here :   Xulrunner Files Hope you have downloaded above two files. Here our journey going to start. Create your winform application in visual studio and do the following: right click the toolbox -> Choose items -> Browse the "Geckofx-winforms.dll" and click "yes" for “Load it anyw...

Code to create log files in C#.Net|Asp.Net

Introduction: In my previous article I have explained about how to create, delete and check whether the directory exists using C#.Net . In this article I am going to explain about  How to create log files in C#.Net. Explanation: Log files are useful to track any runtime errors and exceptions in all the applications. Below code will code will get the Message and Pagename as the input and creates the log file in that date. For that first i have imported below two namespaces.