Keeping your images away from the evil robots!

From: John Yeo <jonnieo_at_domain.name.suppressed>
Date: Fri 02 Aug 2002 - 09:07:01 PDT

This is a little off topic, as it doesn't relate directly to pinhole, but
many members of this list have websites exhibiting their work, so I decided
to put it out there. I have recently noticed a few requests every day on my
webserver for "robots.txt". After getting that file (which doesn't even
exist on my webserver!), they would download various other html files. I
found all these requests for the non-existant file strange, and also that
the html files were being downloaded, but the images weren't being viewed
(well, not usually).

After looking a bit on google, I discovered that these are actually robots
or "spiders" that scour the internet for various reasons, such as collecting
information for search engines, but more importantly, linking or stealing
your images. Fortunately, you can lock these robots out of parts of your
website, or the whole site itself. Usually you would want to leave the site
open for search engines to increase traffic, but lock out your gallery so
sites like http://images.google.com, can't get at your artwork.

To lock the robots out of your entire website, make a text file called
robots.txt, and put the following lines in it:

User-agent: *
Disallow: /

To lock them out of your images directory, use the following lines:

User-agent: *
Disallow: /images/

Put the text file in the same directory as your main index.html file. I
don't know if this will work if your website is contained in a directory,
like Guillermo's site at http://members.rogers.com/penate/, but it will work
if you have it at a domain, such as my site at http://www.enteric.org. You
can do a search on google for robots.txt to find out more.

Hope this is of use to somebody,
John
Received on Fri Aug 2 09:06:08 2002

This archive was generated by hypermail 2.1.8 : Mon 13 Dec 2004 - 23:18:46 PST