1. #1
    Pandini is an unknown quantity at this point Honourable Member
    Join Date
    Nov 2004
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Screen Scraping DGM

    I'm writing me a little web page in ASP.NET (VB) that will log into all of my affiliate accounts and give me a total.

    It's going well, but I can't figure out how to log into to DGM - the code that I am trying is returning a Server Error 500

    Does anyone one have any code - in any language, though preferably VB-based - that logs into DGM automatically?

    Cheers

    P

  2. #2
    monkeyboy is an unknown quantity at this point Registered User
    Join Date
    Sep 2003
    Posts
    64
    Thanks
    0
    Thanked 0 Times in 0 Posts
    can you post the code that you are using?
    Prezzybox.com home of the gift wizard

  3. #3
    Pandini is an unknown quantity at this point Honourable Member
    Join Date
    Nov 2004
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Originally posted by monkeyboy
    can you post the code that you are using?
    Sure

    Function readHtmlPage() As String
    Dim cookieJar As CookieContainer = New CookieContainer
    Dim webReq As HttpWebRequest
    Dim webResp As HttpWebResponse
    Dim sr As StreamReader
    Dim sw As StreamWriter
    Dim payLoad As String
    Dim txt As String

    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/")),HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "GET"

    ' get login page
    webResp = webReq.GetResponse

    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()

    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "POST"
    webReq.ContentType = "application/x-www-form-urlencoded"

    payLoad = "Login=XXXX&Password=XXXX"

    webReq.ContentLength = payLoad.Length
    sw = New StreamWriter(webReq.GetRequestStream)
    sw.Write(payLoad)
    sw.Close()

    ' post login parms
    webResp = webReq.GetResponse

    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()
    readHtmlPage = txt
    End Function


    It first gets the DGM homepage, picks up any cookies, and then tries to submit the form to authenticate.cfm. For me, it fails with a
    The remote server returned an error: (500) Internal Server Error.
    at the final "webResp = webReq.GetResponse"

    I've replaced my DGM login and password with XXXX for now!

    Cheers

    P.

  4. #4
    monkeyboy is an unknown quantity at this point Registered User
    Join Date
    Sep 2003
    Posts
    64
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I notice that you're only submitting two of the visible form elements and none of the hidden elements on the form at http://dgm2.com/

    It could be that when the form is submitted the code checks, or tries to use, one of these missing elements and fails.

    Try adding all of them to payLoad, including the submit button as the target page outputs this:

    Error resolving parameter BUTTON.
    Prezzybox.com home of the gift wizard

  5. #5
    Pandini is an unknown quantity at this point Honourable Member
    Join Date
    Nov 2004
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Originally posted by monkeyboy
    [B]I notice that you're only submitting two of the visible form elements and none of the hidden elements on the form at http://dgm2.com/
    Excellent!
    Thanks

    My Payload string is now

    payLoad = "emailconfirm=0&Account={ts '2005-04-05 17:49:51'}&LastLogin={ts '2005-04-05 17:49:51'}&Login=XXXX&Password=XXXX&button= L O G I N "

    and I am in!

    Now to get forward from there, but thanks so much for your help.

    Regards

    P.

  6. #6
    Pandini is an unknown quantity at this point Honourable Member
    Join Date
    Nov 2004
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Now having trouble getting past the second form submission on authenticate.cfm

    Here's the code that is, again, producing a 500 Server Error


    Function readHtmlPage() As String
    Dim cookieJar As CookieContainer = New CookieContainer
    Dim webReq As HttpWebRequest
    Dim webResp As HttpWebResponse
    Dim sr As StreamReader
    Dim sw As StreamWriter
    Dim payLoad As String
    Dim txt As String

    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/")),HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "GET"

    ' get login page
    webResp = webReq.GetResponse

    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()


    'Post authenticate form
    '-----------------------
    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "POST"
    webReq.ContentType = "application/x-www-form-urlencoded"

    payLoad = "emailconfirm=0&Account={ts '2005-04-05 17:49:51'}&LastLogin={ts '2005-04-05 17:49:51'}&Login=XXXX&Password=XXXX&button= L O G I N "
    'payLoad = server.UrlEncode(payLoad)

    webReq.ContentLength = payLoad.Length
    sw = New StreamWriter(webReq.GetRequestStream)
    sw.Write(payLoad)
    sw.Close()

    ' post login parms
    webResp = webReq.GetResponse

    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()
    readHtmlPage = txt
    'response.Write(txt)

    Dim stra, strb as string
    Dim intMarker as integer
    intMarker=instr(txt,"Hits")
    strb=mid(txt,intMarker+13,len(txt))
    intMarker=instr(strb,chr(34))
    strb=mid(strb,1,intMarker-1)

    intMarker=instr(txt,"LastLogin")
    stra=mid(txt,intMarker+18,len(txt))
    intMarker=instr(stra,chr(34))
    stra=mid(stra,1,intMarker-1)


    'Post post-authenticate form
    '-----------------------
    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")),HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "POST"
    webReq.ContentType = "application/x-www-form-urlencoded"

    payLoad = "Login=XXXX&Direct=a/affadmin/index_view.cfm&Hits=" & strb & "&LastLogin=" & stra & "&emailconfirm=0&go= E N T E R P R O T E C T E D A R E A "
    'payLoad = server.UrlEncode(payLoad)

    webReq.ContentLength = payLoad.Length
    sw = New StreamWriter(webReq.GetRequestStream)
    sw.Write(payLoad)
    sw.Close()

    ' post login parms
    webResp = webReq.GetResponse

    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()
    readHtmlPage = txt
    response.Write(txt)


    End Function


    Anyone got any ideas?

    P.

  7. #7
    monkeyboy is an unknown quantity at this point Registered User
    Join Date
    Sep 2003
    Posts
    64
    Thanks
    0
    Thanked 0 Times in 0 Posts
    As I can't see what the form looks like I'm not sure.

    Assuming you have found all the hidden fields I'd say double check what values are in stra and strb.
    Prezzybox.com home of the gift wizard

  8. #8
    HeresJonny is an unknown quantity at this point Registered User
    Join Date
    Aug 2003
    Location
    Cheshire
    Posts
    263
    Thanks
    0
    Thanked 1 Time in 1 Post
    Hi

    Try this, works for me - note that I'm using an IO.Stream rather than a StreamWriter. You also don't need to do the second authenticate form once logged in - just go on to whichever page you want next.

    Jon

    Dim cookieJar As CookieContainer = New CookieContainer
    Dim webReq As HttpWebRequest
    Dim webResp As HttpWebResponse
    Dim sr As StreamReader
    Dim sw As IO.Stream
    Dim payLoad As String
    Dim txt As String
    Dim data As Byte()
    Dim encoding As New System.Text.ASCIIEncoding


    'Post authenticate form
    '-----------------------
    webReq = CType(WebRequest.Create(New Uri("http://dgm2.com/authenticate.cfm")), HttpWebRequest)
    webReq.CookieContainer = cookieJar
    webReq.Credentials = CredentialCache.DefaultCredentials
    webReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"
    webReq.KeepAlive = True
    webReq.Headers.Set("Pragma", "no-cache")
    webReq.Timeout = 5000
    webReq.Method = "POST"
    webReq.ContentType = "application/x-www-form-urlencoded"

    payLoad = "emailconfirm=0&Account=17-Mar-05&LastLogin=17-Mar-05&Login=xxx&Password=yyy&button= L O G I N "
    data = encoding.GetBytes(payLoad)

    webReq.ContentLength = data.Length
    sw = webReq.GetRequestStream
    sw.Write(data, 0, data.Length)
    sw.Close()

    '' post login parms
    webResp = webReq.GetResponse
    sr = New StreamReader(webResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    webResp.Close()
    readHtmlPage = txt

  9. #9
    Pandini is an unknown quantity at this point Honourable Member
    Join Date
    Nov 2004
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Done, dusted and sorted.

    Thanks so much for your help!

    P.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Content Relevant URLs by vBSEO 3.5.0 RC2