-
03-04-05 #1
Honourable Member
- Join Date
- Nov 2004
- Posts
- 37
- Thanks
- 0
- Thanked 0 Times in 0 Posts
Screen Scraping DGM
I'm writing me a little web page in ASP.NET (VB) that will log into all of my affiliate accounts and give me a total.
It's going well, but I can't figure out how to log into to DGM - the code that I am trying is returning a Server Error 500
Does anyone one have any code - in any language, though preferably VB-based - that logs into DGM automatically?
Cheers
P
-
05-04-05 #2
Registered User
- Join Date
- Sep 2003
- Posts
- 64
- Thanks
- 0
- Thanked 0 Times in 0 Posts
can you post the code that you are using?
Prezzybox.com home of the gift wizard
-
05-04-05 #3
Honourable Member
- Join Date
- Nov 2004
- Posts
- 37
- Thanks
- 0
- Thanked 0 Times in 0 Posts
SureOriginally posted by monkeyboy
can you post the code that you are using?
Function readHtmlPage() As String
Dim cookieJar As CookieContainer = New CookieContainer
Dim webReq As HttpWebRequest
Dim webResp As HttpWebResponse
Dim sr As StreamReader
Dim sw As StreamWriter
Dim payLoad As String
Dim txt As String
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/")),HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "GET"
' get login page
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "POST"
webReq.ContentType = "application/x-www-form-urlencoded"
payLoad = "Login=XXXX&Password=XXXX"
webReq.ContentLength = payLoad.Length
sw = New StreamWriter(webReq.GetRequestStream)
sw.Write(payLoad)
sw.Close()
' post login parms
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
readHtmlPage = txt
End Function
It first gets the DGM homepage, picks up any cookies, and then tries to submit the form to authenticate.cfm. For me, it fails with a
The remote server returned an error: (500) Internal Server Error.
at the final "webResp = webReq.GetResponse"
I've replaced my DGM login and password with XXXX for now!
Cheers
P.
-
05-04-05 #4
Registered User
- Join Date
- Sep 2003
- Posts
- 64
- Thanks
- 0
- Thanked 0 Times in 0 Posts
I notice that you're only submitting two of the visible form elements and none of the hidden elements on the form at http://dgm2.com/
It could be that when the form is submitted the code checks, or tries to use, one of these missing elements and fails.
Try adding all of them to payLoad, including the submit button as the target page outputs this:
Error resolving parameter BUTTON.Prezzybox.com home of the gift wizard
-
05-04-05 #5
Honourable Member
- Join Date
- Nov 2004
- Posts
- 37
- Thanks
- 0
- Thanked 0 Times in 0 Posts
Excellent!Originally posted by monkeyboy
[B]I notice that you're only submitting two of the visible form elements and none of the hidden elements on the form at http://dgm2.com/
Thanks
My Payload string is now
payLoad = "emailconfirm=0&Account={ts '2005-04-05 17:49:51'}&LastLogin={ts '2005-04-05 17:49:51'}&Login=XXXX&Password=XXXX&button= L O G I N "
and I am in!
Now to get forward from there, but thanks so much for your help.
Regards
P.
-
05-04-05 #6
Honourable Member
- Join Date
- Nov 2004
- Posts
- 37
- Thanks
- 0
- Thanked 0 Times in 0 Posts
Now having trouble getting past the second form submission on authenticate.cfm
Here's the code that is, again, producing a 500 Server Error
Function readHtmlPage() As String
Dim cookieJar As CookieContainer = New CookieContainer
Dim webReq As HttpWebRequest
Dim webResp As HttpWebResponse
Dim sr As StreamReader
Dim sw As StreamWriter
Dim payLoad As String
Dim txt As String
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/")),HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "GET"
' get login page
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
'Post authenticate form
'-----------------------
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "POST"
webReq.ContentType = "application/x-www-form-urlencoded"
payLoad = "emailconfirm=0&Account={ts '2005-04-05 17:49:51'}&LastLogin={ts '2005-04-05 17:49:51'}&Login=XXXX&Password=XXXX&button= L O G I N "
'payLoad = server.UrlEncode(payLoad)
webReq.ContentLength = payLoad.Length
sw = New StreamWriter(webReq.GetRequestStream)
sw.Write(payLoad)
sw.Close()
' post login parms
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
readHtmlPage = txt
'response.Write(txt)
Dim stra, strb as string
Dim intMarker as integer
intMarker=instr(txt,"Hits")
strb=mid(txt,intMarker+13,len(txt))
intMarker=instr(strb,chr(34))
strb=mid(strb,1,intMarker-1)
intMarker=instr(txt,"LastLogin")
stra=mid(txt,intMarker+18,len(txt))
intMarker=instr(stra,chr(34))
stra=mid(stra,1,intMarker-1)
'Post post-authenticate form
'-----------------------
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "POST"
webReq.ContentType = "application/x-www-form-urlencoded"
payLoad = "Login=XXXX&Direct=a/affadmin/index_view.cfm&Hits=" & strb & "&LastLogin=" & stra & "&emailconfirm=0&go= E N T E R P R O T E C T E D A R E A "
'payLoad = server.UrlEncode(payLoad)
webReq.ContentLength = payLoad.Length
sw = New StreamWriter(webReq.GetRequestStream)
sw.Write(payLoad)
sw.Close()
' post login parms
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
readHtmlPage = txt
response.Write(txt)
End Function
Anyone got any ideas?
P.
-
05-04-05 #7
Registered User
- Join Date
- Sep 2003
- Posts
- 64
- Thanks
- 0
- Thanked 0 Times in 0 Posts
As I can't see what the form looks like I'm not sure.
Assuming you have found all the hidden fields I'd say double check what values are in stra and strb.Prezzybox.com home of the gift wizard
-
06-04-05 #8
Registered User
- Join Date
- Aug 2003
- Location
- Cheshire
- Posts
- 263
- Thanks
- 0
- Thanked 1 Time in 1 Post
Hi
Try this, works for me - note that I'm using an IO.Stream rather than a StreamWriter. You also don't need to do the second authenticate form once logged in - just go on to whichever page you want next.
Jon
Dim cookieJar As CookieContainer = New CookieContainer
Dim webReq As HttpWebRequest
Dim webResp As HttpWebResponse
Dim sr As StreamReader
Dim sw As IO.Stream
Dim payLoad As String
Dim txt As String
Dim data As Byte()
Dim encoding As New System.Text.ASCIIEncoding
'Post authenticate form
'-----------------------
webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")), HttpWebRequest)
webReq.CookieContainer = cookieJar
webReq.Credentials = CredentialCache.DefaultCredentials
webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
"
webReq.KeepAlive = True
webReq.Headers.Set("Pragma", "no-cache")
webReq.Timeout = 5000
webReq.Method = "POST"
webReq.ContentType = "application/x-www-form-urlencoded"
payLoad = "emailconfirm=0&Account=17-Mar-05&LastLogin=17-Mar-05&Login=xxx&Password=yyy&button= L O G I N "
data = encoding.GetBytes(payLoad)
webReq.ContentLength = data.Length
sw = webReq.GetRequestStream
sw.Write(data, 0, data.Length)
sw.Close()
'' post login parms
webResp = webReq.GetResponse
sr = New StreamReader(webResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
webResp.Close()
readHtmlPage = txt
-
06-04-05 #9
Honourable Member
- Join Date
- Nov 2004
- Posts
- 37
- Thanks
- 0
- Thanked 0 Times in 0 Posts
Done, dusted and sorted.
Thanks so much for your help!
P.
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks