Mar 14, 2013, by admin
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site. The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don’t find it there, they will assume that there is no Robots.txt file they index everything they find along the way. So, if you don’t put robots.txt in the right place the search engines will index your whole site.
Allow indexing of everything
User-agent: *
Disallow:
Disallow indexing of everything
User-agent: *
Disallow: /
Disawllow indexing of a psecific folder
User-agent: *
Disallow: /folder/
Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder
User-agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html
http://www.robotstxt.org/orig.html
http://www.robotstxt.org/wc/faq.html