Tuesday, 24 March 2009

SQL Split Function

Ever wanted to perfiorm a split in SQL, heres a handy function to do just this....

CREATE FUNCTION fn_Split(@sText varchar(8000), @sDelim varchar(20) = ' ')
RETURNS @retArray TABLE (idx smallint Primary Key, value varchar(8000))
AS
BEGIN
DECLARE @idx smallint,
@value varchar(8000),
@bcontinue bit,
@iStrike smallint,
@iDelimlength tinyint

IF @sDelim = 'Space'
BEGIN
SET @sDelim = ' '
END

SET @idx = 0
SET @sText = LTrim(RTrim(@sText))
SET @iDelimlength = DATALENGTH(@sDelim)
SET @bcontinue = 1

IF NOT ((@iDelimlength = 0) or (@sDelim = 'Empty'))
BEGIN
WHILE @bcontinue = 1
BEGIN

--If you can find the delimiter in the text, retrieve the first element and
--insert it with its index into the return table.

IF CHARINDEX(@sDelim, @sText)>0
BEGIN
SET @value = SUBSTRING(@sText,1, CHARINDEX(@sDelim,@sText)-1)
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END

--Trim the element and its delimiter from the front of the string.
--Increment the index and loop.
SET @iStrike = DATALENGTH(@value) + @iDelimlength
SET @idx = @idx + 1
SET @sText = LTrim(Right(@sText,DATALENGTH(@sText) - @iStrike))

END
ELSE
BEGIN
--If you can’t find the delimiter in the text, @sText is the last value in
--@retArray.
SET @value = @sText
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END
--Exit the WHILE loop.
SET @bcontinue = 0
END
END
END
ELSE
BEGIN
WHILE @bcontinue=1
BEGIN
--If the delimiter is an empty string, check for remaining text
--instead of a delimiter. Insert the first character into the
--retArray table. Trim the character from the front of the string.
--Increment the index and loop.
IF DATALENGTH(@sText)>1
BEGIN
SET @value = SUBSTRING(@sText,1,1)
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END
SET @idx = @idx+1
SET @sText = SUBSTRING(@sText,2,DATALENGTH(@sText)-1)

END
ELSE
BEGIN
--One character remains.
--Insert the character, and exit the WHILE loop.
INSERT @retArray (idx, value)
VALUES (@idx, @sText)
SET @bcontinue = 0
END
END

END

RETURN
END

Thursday, 12 March 2009

Browser Hijacking for the greater good

I am also contracted to Searchlogic Plc as a senior developer and as such i tend to work on all sorts of automation and server applications. Recently at Searchlogic we have been working on a data mining application that collects various technical details and analysis information about a given website for the purpose of marketing.

The parts i have been working on is the development of a crawler that gathers SERPs (Search Engine Ranking Position) data. There are various ways of doing this but the approach used has been to utilise a COM object to automate Internet Explorer.

The rational for this is that most search engines do not like automated requests and it is actually in breach of there terms of use. Automating IE and randomly changing behaviour patterns can mimic a users behaviour.

However as it says this is basically a browser hijack, most virus and firewall software detects this as suspicious and this poses a problem when developing such as application. Also this has been designed as a asynchronous thread application that can be deployed on as many drones as you like to speed its overall performance up. Multiple deployments on several different platforms allows for the formation of a cluster to which workload can be shared and allocated via a simple HTTP request. This forms the data access layer for the client applications.

More information will follow regarding this project and further developments on this.
Heres some general information about COM and the Internet Explorer Application COM hooks

-------------------------------------

Opening a new instance of IE in VB 6 can involve subtle problems. You can open an instance of IE in VB 6 with code that looks like this. (Remember to reference Microsoft Internet Controls in your project.)Public ie as InternetExplorer
Private Sub Command1_Click()
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate2 "http://visualbasic.about.com"
End Sub
Simple, No? But suppose you decide that you need to create a new IE window and also to make sure the second web page has completely loaded before continuing. You would need to do this if, for example, there is some critical data on the web page that you need to use later in the program.
One way that you might decide to do this is to use the Flags parameter that you can pass to IE to create the new window. You might, for example, try this code:Private Sub Command1_Click()
' Example Code Only - Do Not Use
' Create the IE window
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.about.com"
' Create the second IE window
Set ie2 = CreateObject("InternetExplorer.Application")
ie2.Visible = True
ie2.Navigate2 "http://visualbasic.about.com", 1
' Wait for the second one to complete
Dim i As Integer
i = 1
While ie2.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie busy " & CStr(i * 0.1) & _
" seconds with ReadyState " & ie2.ReadyState
i = i + 1
Wend
End Sub
Beware! The code loops endlessly for reasons that are detailed in the rest of this article. You can stop the program by simply closing the IE windows that are opened. This creates an error condition that lets you at least get back in control.
But ... what is that extra window with nothing in it? What's going on here?
Here's what is happening. A "1" in the flag passed to the Navigate2 method tells IE to open a new window. What the Microsoft documentation doesn't explain very well (if at all) is that the ie object in your program refers to the NEW window after that and not the old one (which is still open). Furthermore, IE simply opens an "old" IE window with no content if one isn't available. And since the object refers to the "old" IE window rather than the new one and that one never has a URL address, the ReadyState is always 0.
The correct way to code "most" of the requirements above is as follows:Private Sub Command1_Click()
' Create the IE window
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.about.com"
' Create the second IE window
ie1.Navigate2 "http://visualbasic.about.com", 1
' Wait for the second one to complete
Dim i As Integer
i = 1
While ie1.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie busy " & CStr(i * 0.1) & _
" seconds with ReadyState " & ie1.ReadyState
i = i + 1
Wend
End Sub
Why only "most" of the requirements? Why not "all"? Because the ReadyState of the first window is being tested, not the second. In fact, you might want to play around with this code a bit and see if you can access the second object at all. My bet is that you can't. .... Unless you use the technique described next!
There are a number of key things that need to be changed to make this work.
The ie object must be declared WithEvents to allow your code to handle the NewWindow event.
A new ie object must be created for the new window.
An event handling NewWindow event subroutine must be added to your project. This event fires before the new window is created and gives you a chance to assign a pointer to the new window object before IE creates a blank one.
The ppDisp property must be set to point to this new window.
When you do all of these things, you will have a project that looks like this one:Public WithEvents ie1 As InternetExplorer
Public ie2 As InternetExplorer
Private Declare Sub Sleep Lib "kernel32" ( _
ByVal dwMilliseconds As Long)
Private Sub Command1_Click()
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.yahoo.com"
While ie1.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie1 busy"
Wend
Debug.Print ie1.LocationURL & "AA"
ie1.Navigate2 "http://www.google.com", 1
While ie2.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie2 busy"
Wend
Debug.Print ie2.LocationURL & "BB"
End Sub
Private Sub ie1_NewWindow2(ppDisp As Object, Cancel As Boolean)
Set ie2 = CreateObject("InternetExplorer.Application")
Set ppDisp = ie2.Application
Debug.Print "NewWindow2"
End Sub

Sunday, 8 March 2009

Pirate Bay awaits court verdict

The trial of the creators of the file-sharing site The Pirate Bay has ended.

Lawyers for both the prosecution and defence have delivered their closing arguments in the high-profile copyright trial in Sweden.

Frederik Neij, Carl Lundstrom, Peter Sunde and Gottfrid Svartholm Warg - are accused of promoting copyright infringement via the hugely popular The Pirate Bay website.

The judge in the case is not expected to deliver a verdict for several weeks.

In their final statement prosecutors called for a one-year prison sentence to be imposed on the four administrators of the site.

The Pirate Bay hosts thousands of links to so-called torrent files, which allow for movies, TV programmes and applications to be shared online.

No copyright material is stored directly on The Pirate Bay servers.

"I believe that the correct punishment should be one year in prison and that is what I am requesting that the district court hand down in this case," prosecutor Haakan Roswall told the court.

Lawyers acting to defend the four men spent their last day in court showing how BitTorrent works and calling for their clients to be acquitted.

They also challenged prosecution estimates of how much the administrators of the site have made from the site.

The four men have been charged with earning at least 1.2m kronor (£92,000) by facilitating copyright infringement.

The film, music and video games industries are seeking about 117m kronor (£9m) in damages and interest for losses incurred from tens of millions of illegal downloads facilitated by the site.

After the trial closed, Ludvig Werner, chairman of the International Federation of the Phonographic Industry (IFPI) in Sweden, said: "It is important to see organisations representing rights holders from all over the world show their support in the trial of The Pirate Bay.

"It's particularly encouraging to see support coming from thousands of small, new and independent creative companies," he added.

"That's a powerful response to The Pirate Bay, whose propaganda very often misrepresents this as a battle between young and old, and between new and old techniques," he said.

URL rewrites for older ASP websites

URL rewrites for older ASP websites

One of the biggest dilemmas on Microsoft based operating systems using Internet Information Services is URL rewrites. Even though the >NET framework does offer some tools for this for older technologies such as Active Server Pages or even HTML files there isn’t much to offer, there are third party applications that add ISAPI filters to perform rewrites much like Apache but you can perform rewrites buy using the IIS framework itself and custom error pages.

So how to perform rewrites using the error pages and Active Server Pages, well first you will need access to the custom error page settings in IIS or ask you host to change the following to map to your script that you can write explained here. Change the following error pages….

403;14 : Forbidden
404 : page not found
405 : Method not allowed
500;15 Internal Server Error

This means whenever a page cannot be found, or an error occurs (because of a global.asa entry), or even a permissions issue that it fires your custom ASP page. So now the server knows how to handle these errors lets give it the tools to do so…

Use the following below to construct a script to handle these events

<%@ Language=VBScript %>
<%
script = request.ServerVariables.Item("SCRIPT_NAME")
err_input = request.ServerVariables.Item("QUERY_STRING")

target = "newdomain.com/"
oldurl = " olddomain.com "

fixSTR = replace(err_input, oldurl,target)
fixSTR = replace(fixSTR , "404;","")
fixSTR = replace(fixSTR , "/:80","")

if instr(fixSTR, "403;") = FALSE then
else
fixSTR = "http://www." & target
end if

Response.Status="301 Moved Permanently"
Response.AddHeader "Location",fixSTR
%>

What this does is if for example you have http://www.olddomain.com/mysite.asp this would perform a 301 redirect to http://www.newdomain.com/mysite.asp.

How this is done is that IIS on error, loads the error page passing the requested resource as a query to the IIS engine to process the error, by grabbing this information, removing the error code and port and then constructing the URL with the correct domain you can build the new URL on the fly. Then by writing a new Status to the engine along with a new Header you can perform a 301 redirect. This is a cheap alternative to third party applications and a lot of server reconfiguration. Also this can be used on any IIS server without any .NET framework.

Of course you could build a case select routine to hardcode set mappings.