All search engines and crawlers should read the robots.txt tag, and follow the directives within, which tells them the files or directories they aren't allowed to index.
You can also use the meta tag:
< remove_this_meta name="robots" content="noindex,nofollow">
on a page to stop it being indexed, but the robots.txt file is the best solution, although you have to be very carefully with pattern matching so you don't accidentally tell them not to index your whole site!
Its best to create a robots.txt site for every site, even if its empty, as the spiders frequently request it, and if it doesn't exist your error log files can fill up with 404 requests for the file.
Have a look at http://www.robotstxt.org for lots of info on this topic.
LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks