+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 26

 

Thread: Legalities of scraping a site

  1. #1
    max99's Avatar
    Affiliate Student Guy

    Status
    Offline
    Join Date
    Jan 2006
    Location
    Manchester UK
    Posts
    1,533
    Thanks
    32
    Thanked 15 Times in 15 Posts


    Just wandered the legalities of scraping a website which lists things for sale and listing it on my site, giving extra exposure to the item for sale.

    Now is this legal? or not ?

    Does the owner of the site own the listing submitted by the user ?
    OnlineClick.co.uk - PPC,SEO,Content,Email & Joint Ventures | Msn: My Username @ hotmail.co.uk

  2. #2
    Registered User

    Status
    Offline
    Join Date
    Nov 2007
    Posts
    377
    Thanks
    29
    Thanked 51 Times in 46 Posts
    Quote Originally Posted by max99 View Post
    Just wandered the legalities of scraping a website which lists things for sale and listing it on my site, giving extra exposure to the item for sale.

    Now is this legal? or not ?

    Does the owner of the site own the listing submitted by the user ?
    It may entirely depend on what the terms of the site are and what they think really.

    i.e. gumtree states in the TOS..

    As a condition of your use of Gumtree you agree that you will not:

    impose an unreasonable load on our infrastructure or interfere with the proper working of Gumtree;
    copy, modify, or distribute any other person's content without their consent;

    ..and other such things so it might be worth having a look around at whatever legal statements they have on the site.

    Mike
    Looking for link swaps and article exchanges on sites relating to cycling, bikes, mountain bikes, cycling equipment etc. Drop me a PM.

  3. #3
    Dynamoo's Avatar
    Mooooo

    Status
    Offline
    Join Date
    Dec 2003
    Location
    Somewhere in Bedfordshire
    Posts
    1,908
    Thanks
    5
    Thanked 60 Times in 43 Posts
    The use of "snippets" from another site is a hot copyright topic. One argument is that a short text excerpt falls under "fair use", but of course there are those who argue that it doesn't. Even Google has this problem.

    But copyright is not the only problem.. excessive scraping of a site can effectively damage traffic to that site, and that could leave you open to a claim for damages.
    Never email donotemail@WeAreSpammers.com

  4. #4
    max99's Avatar
    Affiliate Student Guy

    Status
    Offline
    Join Date
    Jan 2006
    Location
    Manchester UK
    Posts
    1,533
    Thanks
    32
    Thanked 15 Times in 15 Posts
    hmm sounds a hot topic!

    sounds like something i shouldnt go into lol
    OnlineClick.co.uk - PPC,SEO,Content,Email & Joint Ventures | Msn: My Username @ hotmail.co.uk

  5. #5
    Registered User

    Status
    Offline
    Join Date
    Aug 2007
    Posts
    868
    Thanks
    62
    Thanked 109 Times in 82 Posts
    Content is scraped every day of the week and dressed up in another 'useful' format. My own view is that you move closer to being illegal the closer that your display of the data matches the original content - particularly, when viewed from the perspective of what percentage of the original page you are displaying.

    When did Google ask your permission to scrape all of your websites? How large would the Google snippet need to become to display all of your content to a Google searcher.

    Clear as mud.

  6. #6
    Registered User

    Status
    Offline
    Join Date
    Nov 2008
    Location
    Essex, UK
    Posts
    253
    Thanks
    1
    Thanked 11 Times in 8 Posts
    im interested about this with regards to public information such as premier league positions and any other sports tables

    if you scrape it off another site, save it locally and dont bombard them is it still bad?

  7. #7
    max99's Avatar
    Affiliate Student Guy

    Status
    Offline
    Join Date
    Jan 2006
    Location
    Manchester UK
    Posts
    1,533
    Thanks
    32
    Thanked 15 Times in 15 Posts
    Okay site in question is FetchLocal/co/uk , aquired it recently, and not sure what to do with it, one option would be to build a scraper to scrape other sites such as businessesforsale/com.

    Anyone any suggestions
    OnlineClick.co.uk - PPC,SEO,Content,Email & Joint Ventures | Msn: My Username @ hotmail.co.uk

  8. #8
    90% of all sites are crap

    Status
    Offline
    Join Date
    Nov 2003
    Location
    the moon
    Posts
    1,700
    Thanks
    89
    Thanked 68 Times in 44 Posts
    Interesting thread for the forum but you'll get those with a moral compass that stays in ones pocket saying pffft why not, everyone else is doing it so sod it and those who will say no way, immoral....what if i scraped your content and rehashed it..you'd be pissed right?

    However, can you truely claim that information freely available in the public domain is really "property" of the person displaying it in a nice consolidated form?

    9/10 times how you think most of the sites displaying hugely complicated data sets/results get that data?

    1 - they scrape it
    2 - they buy it from a company who's scraped it
    3 - they spend a huge amount buying data thats been hand entered by data teams in 3rd world....who just go copy the data from some other source!

    If you use your common sense and scrape the bare minimum needed to be the foundation of your content then make it your own somehow you will be fine provided what you scrape isnt totally unique otherwise yeah I'd say you're in trouble lol
    Tokyo::Paris::New York::Bromley

  9. #9
    90% of all sites are crap

    Status
    Offline
    Join Date
    Nov 2003
    Location
    the moon
    Posts
    1,700
    Thanks
    89
    Thanked 68 Times in 44 Posts
    Oooo careful with scraping local data. Its not uncommon for the larger local data owners to insert bullsh1t data into their listings.

    Then when it turns up on whoevers scraped its site they'll nail you as they'll have proof its their data you've taken!
    Tokyo::Paris::New York::Bromley

  10. The Following User Says Thank You to tomj For This Useful Post:

    Mogga (16-04-09)

  11. #10
    Dynamoo's Avatar
    Mooooo

    Status
    Offline
    Join Date
    Dec 2003
    Location
    Somewhere in Bedfordshire
    Posts
    1,908
    Thanks
    5
    Thanked 60 Times in 43 Posts
    Quote Originally Posted by confuscius View Post
    When did Google ask your permission to scrape all of your websites? How large would the Google snippet need to become to display all of your content to a Google searcher.
    One difference between Google (or most other search engines) and a scraper site is that Google does not offer pages of scraped results as "content". Queries are generated in realtime.. indeed, search engine results (from any search engine) are not included in the Google index.

    I'm not convinced that a scraper site is legal or ethical. But if you were doing it, then perhaps you need to make sure you honour robots.txt and clearly identify your crawler. That way, anyone who want to opt out can do so and have their content removed.

    On another track, have you considered using ODP (dmoz.org) data? Although that's not as comprehensive as it once was, it's free to use if you follow the licence agreement.
    Never email donotemail@WeAreSpammers.com

  12. The Following User Says Thank You to Dynamoo For This Useful Post:

    Mogga (17-04-09)

  13. #11
    Registered User

    Status
    Offline
    Join Date
    Aug 2007
    Posts
    868
    Thanks
    62
    Thanked 109 Times in 82 Posts
    Quote Originally Posted by Dynamoo View Post
    ... that Google does not offer pages of scraped results as "content". Queries are generated in realtime. ...
    As far as I can see, every time a google search query is run then it returns snippets from websites and documents where it used a 'scraped' copy of the content as the basis of the snippet that it returns. The point that I was trying to get over is how big does the snippet have to be before Google moves from 'fair use' to 'copied content'. I see no problem with using a spider to scrape content provided you respect any robots / meta directives - the problems MAY arise if you go past the 'fair use' line wherever that particular line is. If Google goes over the line then it is likely to get its cheque book in preference to time in court.

    Difficult issue this one.

    Why not try to offer a local search engine based on the 'fair use' approach and gauge reaction from those sites that you include in the local search?

  14. #12
    Registered User

    Status
    Offline
    Join Date
    Jun 2006
    Posts
    615
    Thanks
    7
    Thanked 66 Times in 63 Posts
    Quote Originally Posted by Dynamoo View Post
    I'm not convinced that a scraper site is legal or ethical. But if you were doing it, then perhaps you need to make sure you honour robots.txt and clearly identify your crawler. That way, anyone who want to opt out can do so and have their content removed.

    Surely that would put the onus on the site to block Max99Scraper in their robots.txt - if it's not ethical to scrape a site why should a site have to keep track of everyone that tries to scrape their data and block each one individually?

    Fair to say if a site doesn't want to be scraped they could block more than x requests in y seconds from a hash of the same UA/IP. If a site doesn't do that aren't they effectively offering their data to anyone that wants it?

  15. #13
    Registered User

    Status
    Offline
    Join Date
    Nov 2008
    Location
    Essex, UK
    Posts
    253
    Thanks
    1
    Thanked 11 Times in 8 Posts
    Quote Originally Posted by tomj View Post
    Then when it turns up on whoevers scraped its site they'll nail you as they'll have proof its their data you've taken!
    Not that it affects what i do online, but i've never considered this! great way to catch people huh

  16. #14
    Registered User

    Status
    Offline
    Join Date
    Jul 2007
    Posts
    259
    Thanks
    7
    Thanked 19 Times in 12 Posts
    Quote Originally Posted by confuscius View Post
    As far as I can see, every time a google search query is run then it returns snippets from websites and documents where it used a 'scraped' copy of the content as the basis of the snippet that it returns. The point that I was trying to get over is how big does the snippet have to be before Google moves from 'fair use' to 'copied content'. I see no problem with using a spider to scrape content provided you respect any robots / meta directives - the problems MAY arise if you go past the 'fair use' line wherever that particular line is. If Google goes over the line then it is likely to get its cheque book in preference to time in court.

    Difficult issue this one.

    Why not try to offer a local search engine based on the 'fair use' approach and gauge reaction from those sites that you include in the local search?
    What about Google's cache - that's full copies of the data and content from many many sites.

  17. #15
    Dynamoo's Avatar
    Mooooo

    Status
    Offline
    Join Date
    Dec 2003
    Location
    Somewhere in Bedfordshire
    Posts
    1,908
    Thanks
    5
    Thanked 60 Times in 43 Posts
    Quote Originally Posted by jonsp View Post
    Surely that would put the onus on the site to block Max99Scraper in their robots.txt - if it's not ethical to scrape a site why should a site have to keep track of everyone that tries to scrape their data and block each one individually?
    I'm not saying that I like the idea I'm just saying that if you WERE doing it then it's more ethical to follow robots.txt

    Quote Originally Posted by Colin@DVDTimes
    What about Google's cache - that's full copies of the data and content from many many sites.
    Indeed. And so does the Internet Archive (IA). But both of those can be blocked by robots.txt, and neither the Cache nor the IA are indexed by Google.. the content is only there if you specifically look for it.
    Never email donotemail@WeAreSpammers.com

+ Reply to Thread
Page 1 of 2 1 2 LastLast


Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Amazon bot scraping my site?
    By ep90 in forum Affiliate Marketing Lounge
    Replies: 3
    Last Post: 08-04-09, 03:14 PM
  2. Server clear out deal site, ping site multi upload site and more.
    By KPR in forum Domains & Websites For Sale
    Replies: 1
    Last Post: 27-03-09, 06:52 PM
  3. Legalities.org.uk
    By pendragon in forum Domains & Websites For Sale
    Replies: 0
    Last Post: 24-03-09, 04:27 PM
  4. Merchant Scraping
    By victor_m in forum Affiliate Marketing Lounge
    Replies: 7
    Last Post: 28-10-07, 09:40 PM
  5. Screen Scraping DGM
    By Pandini in forum Programming
    Replies: 8
    Last Post: 06-04-05, 02:26 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
To Top

Content Relevant URLs by vBSEO 3.5.0 RC2