businesswoman thinking and plan business strategy
Some Initial Thoughts on SEO (Part 1)
August 13, 2008
young  businessman with marker writing and planning presentation on glass writeboard
Some Initial Thoughts on SEO (Part 2)
August 20, 2008

The Robots.txt File

pic4

The robots.txt file is used to control web spiders and other web robots on how they crawl your site. More often than not, this file is used to indicate which files and folder are NOT to be crawled, but one can also include the allowable files and folders to focus on. The robots.txt file can also

The robots.txt file can also used to point robots to the location of an xml sitemap. The types of files that are excluded from crawling include content of the selected directories that might be misleading or irrelevant to the categorization of the site as a whole. For websites with multiple sub-domains, each sub-domain must have its own robots.txt file. Such directories to exclude might be a “scripts” directory or an application/database directory.

The robots.txt file can also used to point robots to the location of an xml sitemap. The types of files that are excluded from crawling include content of the selected directories that might be misleading or irrelevant to the categorization of the site as a whole. For websites with multiple sub-domains, each sub-domain must have its own robots.txt file. Such directories to exclude might be a “scripts” directory or an application/database directory.

A typical robots.txt look like the following:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/
Crawl-delay: 10
Sitemap: /public_html/sitemap.xml

There’s nothing really complicated about this file. The star next to the User-agent field indicates that the entire directory should be crawled with the exception of anything in the disallow fields. The crawl-delay field is a way to control the rate at which the spider crawl your directories, in this case once every 10 seconds maximum. The Sitemap field points to the location of the xml sitemap.

Robots.txt should not be used to make part of a site private. This method would not work because the contents of robots.txt are in clear text and available for the world to see. It’s merely a way to control the spidering of a directory.

Get Your Free Website SEO Report

Leave a Reply

Your email address will not be published. Required fields are marked *

Top
NEVER make a serious Wordpress SEO Mistake again, with this simple checklist.

NEVER make a serious Wordpress SEO Mistake again, with this simple checklist.

Download our free, one page, power packed SEO checklist for WordPress. 

You have Successfully Subscribed!