Robots.txt Files: Fence off Sections of Your Web Site from Search Engines.

Updated 11/2017 for accuracy!

Do you ever forget to do something that’s really simple? It’s easy to overlook some of the simple things when you’re worried about the more complex issue of Search Engine Optimization (SEO) or getting traffic to your website, etc. Robots.txt files fall into that category. Do you even have a robots.txt file on your site? It’s very simple and can help with your site’s ranking in the search engines a couple of ways.

What is a robots.txt file?
A robots.txt file is a small text file that you place in the root directory of your web site. You can list directories that robots (search engine spiders) should not visit. You can get specific if you’d like and specify different things for different robots (search engines) by targeting specific user-agents, but generally that’s not necessary.

Here’s a sample robots.txt file:

User-agent: *
Disallow: /cgi-bin/
Disallow: /print-friendly/
Disallow: /~john/

Some Rules:

Specify one subdirectory per line.
- The above example would stop robots from crawling the cgi-bin, print-friendly, and ~john directories.
You can only have one robots.txt file and it has to be in the root directory of your site.

Other ways to do it:
There is also a META tag that has just about the same meaning.
Use this meta tag in the header of a page you don’t want crawled (indexed by a search engine).

Important:
Because all robots may not support or respect the robots.txt file or the META tag, your best bet is to use both.

SEO:
You may be wondering exactly how this could affect SEO (Search Engine Optimization)? The biggest way is with duplicate content. Search engines do not want to find duplicate content. If you have a printer friendly version of your blog entries, then you want to stop the duplicate printer layout versions of your entries from being crawled.

The second way this helps is to stop the printer friendly pages from showing up in search engine result pages at all. We want people finding our sites by searching to land on the regular versions of our pages, not the printer friendly versions. Printer friendly versions typically strip off left and right columns and therefore removes most navigation, etc. By only having the regular pages listed in the search engine results, the experience of a visitor to a site is better.

Of course there are other reasons to stop certain sub-directories/folders from being crawled. You may have products such as eBooks, training videos, or scripts and test pages that you do not want showing up in the results of a search. Because the robots.txt file is not respected by every spider crawling around out there, you should always secure sensitive data in sub-directories/folders that are password and username protected.

The robots.txt file is just a small, easy to create text file, but small things like this can add up to make a big difference.

Learn more about robots.txt files here: www.robotstxt.org/wc/robots.html.

To learn more about Search Engine Optimization get my SEO Brain Dump!

Fred Black

If You Found this Article Useful or Informative Please Share it!

Related Posts