I am also contracted to
Searchlogic Plc as a senior developer and as such i tend to work on all sorts of automation and server applications. Recently at
Searchlogic we have been working
on a data mining application that collects various technical details and analysis information about a given website for the
purpose of marketing.
The parts i have been working on is the development of a crawler that gathers
SERPs (Search Engine Ranking
Position) data. There are various ways of doing this but the approach used has been to utilise a COM object to automate Internet Explorer.
The rational for this is that most search engines do not like automated requests and it is actually in breach of there terms of use. Automating IE and randomly changing
behaviour patterns can mimic a users
behaviour.
However as it says this is basically a browser hijack, most virus and firewall software detects this as suspicious and this poses a problem when developing such as application. Also this has been designed as a
asynchronous thread application that can be deployed on as many drones as you like to speed its overall performance up. Multiple deployments on several different platforms allows for the formation of a cluster to which workload can be shared and allocated via a simple HTTP request. This forms the data access layer for the client applications.
More information will follow regarding this project and further developments on this.
Heres some general information about COM and the Internet Explorer Application COM hooks
-------------------------------------
Opening a new instance of IE in VB 6 can involve subtle problems. You can open an instance of IE in VB 6 with code that looks like this. (Remember to reference Microsoft Internet Controls in your project.)Public ie as InternetExplorer
Private Sub Command1_Click()
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate2 "http://visualbasic.about.com"
End Sub
Simple, No? But suppose you decide that you need to create a new IE window and also to make sure the second web page has completely loaded before continuing. You would need to do this if, for example, there is some critical data on the web page that you need to use later in the program.
One way that you might decide to do this is to use the
Flags parameter that you can pass to IE to create the new window. You might, for example, try this code:Private Sub Command1_Click()
' Example Code Only - Do Not Use
' Create the IE window
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.about.com"
' Create the second IE window
Set ie2 = CreateObject("InternetExplorer.Application")
ie2.Visible = True
ie2.Navigate2 "http://visualbasic.about.com", 1
' Wait for the second one to complete
Dim i As Integer
i = 1
While ie2.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie busy " & CStr(i * 0.1) & _
" seconds with ReadyState " & ie2.ReadyState
i = i + 1
Wend
End Sub
Beware! The code loops endlessly for reasons that are detailed in the rest of this article. You can stop the program by simply closing the IE windows that are opened. This creates an error condition that lets you at least get back in control.
But ... what is that extra window with nothing in it? What's going on here?
Here's what is happening. A "1" in the flag passed to the Navigate2 method tells IE to open a new window. What the Microsoft documentation doesn't explain very well (if at all) is that the ie object in your program refers to the NEW window after that and not the old one (which is still open). Furthermore, IE simply opens an "old" IE window with no content if one isn't available. And since the object refers to the "old" IE window rather than the new one and that one never has a URL address, the ReadyState is always 0.
The correct way to code "most" of the requirements above is as follows:Private Sub Command1_Click()
' Create the IE window
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.about.com"
' Create the second IE window
ie1.Navigate2 "http://visualbasic.about.com", 1
' Wait for the second one to complete
Dim i As Integer
i = 1
While ie1.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie busy " & CStr(i * 0.1) & _
" seconds with ReadyState " & ie1.ReadyState
i = i + 1
Wend
End Sub
Why only "most" of the requirements? Why not "all"? Because the ReadyState of the first window is being tested, not the second. In fact, you might want to play around with this code a bit and see if you can access the second object at all. My bet is that you can't. .... Unless you use the technique described next!
There are a number of key things that need to be changed to make this work.
The ie object must be declared WithEvents to allow your code to handle the NewWindow event.
A new ie object must be created for the new window.
An event handling
NewWindow event subroutine must be added to your project. This event fires before the new window is created and gives you a chance to assign a pointer to the new window object before IE creates a blank one.
The ppDisp property must be set to point to this new window.
When you do all of these things, you will have a project that looks like this one:Public WithEvents ie1 As InternetExplorer
Public ie2 As InternetExplorer
Private Declare Sub Sleep Lib "kernel32" ( _
ByVal dwMilliseconds As Long)
Private Sub Command1_Click()
Set ie1 = CreateObject("InternetExplorer.Application")
ie1.Visible = True
ie1.Navigate2 "http://www.yahoo.com"
While ie1.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie1 busy"
Wend
Debug.Print ie1.LocationURL & "AA"
ie1.Navigate2 "http://www.google.com", 1
While ie2.ReadyState < READYSTATE_COMPLETE
Sleep 100
Debug.Print "ie2 busy"
Wend
Debug.Print ie2.LocationURL & "BB"
End Sub
Private Sub ie1_NewWindow2(ppDisp As Object, Cancel As Boolean)
Set ie2 = CreateObject("InternetExplorer.Application")
Set ppDisp = ie2.Application
Debug.Print "NewWindow2"
End Sub