Internet Business Blog
« Previous | Home | Next »

 

Robots.txt Files: Fence off Sections of Your Web Site from Search Engines.



August 13, 2007

Robots.txt Files: Fence off Sections of Your Web Site from Search Engines.Do you ever forget to do something that's really simple? It's easy to overlook some of the simple things when you're worried about the more complex issue of Search Engine Optimization (SEO) or getting traffic to your website, etc. Robots.txt files fall into that category. Do you even have a robots.txt file on your site? It's very simple and can help with your site's ranking in the search engines a couple of ways.

What is a robots.txt file?
A robots.txt file is a small text file that you place in the root directory of your web site. You can list directories that robots (search engine spiders) should not visit. You can get specific if you'd like and specify different things for different robots (search engines) by targeting specific user-agents, but generally that's not necessary.

Here's a sample robots.txt file:
User-agent: *
Disallow: /cgi-bin/
Disallow: /print-friendly/
Disallow: /~john/

Some Rules:

  • Specify one subdirectory per line.
    • The above example would stop robots from crawling the cgi-bin, print-friendly, and ~john directories.

  • You can only have one robots.txt file and it has to be in the root directory of your site.

Other ways to do it:
There is also a META tag that has just about the same meaning.
Use this meta tag in the header of a page you don't want crawled (indexed by a search engine).

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Important:
Because all robots may not support or respect the robots.txt file or the META tag, your best bet is to use both.

SEO:
You may be wondering exactly how this could affect SEO (Search Engine Optimization)? The biggest way is with duplicate content. Search engines do not want to find duplicate content. If you have a printer friendly version of your blog entries, then you want to stop the duplicate printer layout versions of your entries from being crawled. I have my templates and publishing parameters setup to put all the printer friendly pages in a specific folder and I list this folder in my robots.txt file. I also configured the publishing template for the printer friendly pages to use the META tag shown above. This stops most search engines from indexing the printer friendly versions of the pages and therefore eliminates a possible problem caused by having duplicate content.

The second way this helps is to stop the printer friendly pages from showing up in the search engine result pages at all. I want people finding my site by searching to land on the regular versions of my pages, not the printer friendly versions. My printer friendly template strips off the left and right columns and therefore removes most navigation. By only having the regular pages listed in the search engines the experience of a visitor to my site is better.

Of course there are other reasons to stop certain subdirectories from being crawled. You may have products such as eBooks, training videos, or scripts and test pages that you do not want showing up in the results of a search. Because the robots.txt file is not respected by every spider crawling around out there, you should always secure sensitive data in subdirectories that are password and username protected.

The robots.txt file is just a small, easy to create text file, but small things like this can add up to make a big difference.

Learn more about robots.txt files here: www.robotstxt.org/wc/robots.html.

Fred Black

About the Author

Fred Black is an experienced programmer, web site developer, online business operator, systems integrator, father, husband, musician, and songwriter. Visit his Internet Business Blog at: http://www.pqInternet.com.


Get Free Updates! Enter your name and e-mail address to receive a short notice each time I make a new post.

First Name:

Last Name:

E-Mail Address:

E-Mail again:

NOTE: You will receive a confirmation email. You must click the link in the email to activate your free updates. Please check your spam folder(s) if you don't receive the email.


Reddit Add this Article to Onlywire del.icio.us Technorati StumbleUpon Netscape Sphinn

Tip Jar: Leave a Donation

Comments: 0,   TrackBacks: 0.

Posted by Fred on August 13, 2007 | Printer-Friendly

TrackBack: http://www.pqInternet.com/Blog/mt-tb.cgi/76


Assigned Categories: Search Engines: SEO | Web Site Design, HTML, CSS


Related Entries:


You may reprint or distribute this article as long as you leave the content and the About the Author resource box at the end intact.

 

 
Comments and TrackBacks 
 

 


Post A Comment




Remember personal info?




Comment Policy <--- Read the comment policy (Updated 1/13/2010).

About  Contact  Free Products Fred W. Black

Blog Feeds, EMail, etc.:

Subscribe by EMail RSS 2.0 Feed for www.pqInternet.com. Add to Google Toolbar
What are Blog Feeds and RSS?

Free Updates via EMail

Receive Free Updates.

Free Products and Software.

Search

Link to Me!

How to Link to this Blog.

Products

Products I Use & Recommend

www.3WayLinks.Net

www.1WayLinks.Net

Free Traffic System

Wordtracker Keyword Research Tool

www.aweber.com Opt-In List Management.

1&1 Hosting

Categories

All

ClickBank

Copywriting

Free Videos

Funny

Internet Business

Internet Marketing

Life

Search Engines: SEO

Technology

Traffic

Truth and Freedom

Web Site Design, HTML, CSS

Recent Entries

Birthday Salute!

New Layout

What's the Value of a Link to SEO (Search Engine Optimization)?

Internet Business Ethics 101

Exploding Your Copy (and life) From Sissy to SEISMIC.

Long Tail Keywords

I'm a Slasher!

Start Your Own Home Internet Business with these 7 Easy Steps...

Ben Goss Wins Signed John Maxwell Book!

Social Networking Common Sense

Win a FREE, SIGNED Copy of John Maxwell's book: Talent is Never Enough...

We want you as a customer!

Scarcity: Fanning the Flames of Desire

Programmed to be Poor

Making Your Copy Dance

All Entries

Recently Commented On

Internet Business Ethics 101

What's the Value of a Link to SEO (Search Engine Optimization)?

Birthday Salute!

I've Removed The 'No Follow' Tag from My Blog - You Should Too!

New Layout

We want you as a customer!

Replacing a ClickBank Vendor's Sales Page with Your Own.

Stopping Spam in your Blog

Archives

All

Blog Roll

Clayton Makepeace

Terry Dean

Ryan Healy

ProBlogger

Internet Business Resources Blog

Michel Fortin

G. Brent Riggs

Jonathan Leger

Mark J Ryan

Dr. Joe Vitale

Search Engine Journal

Friday Traffic Report

Links

Cell Phones for Soldiers

Clebe McClary

the IconFactory

Fred Black Music

Daryl Laws Sports Performance Blog

Williams High School Booster Club

Jacob Ingle

Light Peak

EasyIRS.com

Web Hosting

My Recommended Web Hosting Service: 1&1 Hosting

Mugs, Mousepads, etc.

About this Blog...

By:Fred W. Black

Contact Information

Powered by:Movable Type 3.34.

Copyright 2006 -2010, PhaseQuest.Com.
All rights reserved.

Subscribe by EMail RSS 2.0 Feed for www.pqInternet.com.

Add to Google Toolbar
Add www.pqInternet.com, to Google. Add www.pqInternet.com, to My Yahoo! Add www.pqInternet.com, to My MSN. Subscribe to www.pqInternet.com, with Bloglines Add www.pqInternet.com, to Your Technorati Favorites! Add www.pqInternet.com, to Windows Live

rs

Some photos are by: Lee Hinshaw Photography

© Copyright 2006 - 2010 PhaseQuest, all rights reserved.

 

Get Free Updates!

Insert your name and e-mail address to receive a short notice each time I make a new post.

First Name:

Last Name:

E-Mail Address:

E-Mail again:

NOTE: You will receive a confirmation email. You must click the link in the email to activate your free updates. Please check your spam folder(s) if you don't receive the email.

*I value your privacy and will never sell, rent, giveaway, or abuse your information.