Web Hosting Forums

Web Hosting

Discussion Forums

HowTo configure robots.txt properly

This is a discussion on HowTo configure robots.txt properly within the HowTo Tutorials for web hosting forums, part of the Virtual Hosting Forums category; robots.txt : what is it? robots.txt is an instruction file for search engine robots, like googlebot or slurp! This ...


Go Back   Web Hosting Forums > Virtual Hosting Forums > HowTo Tutorials for web hosting

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  2 links from elsewhere to this Post. Click to view. #1  
Old 04-06-2007, 04:06 PM
michael_s's Avatar
AABox Staff
 
Join Date: Dec 2003
Posts: 89
Rep Power: 10
michael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond repute
Default HowTo configure robots.txt properly

robots.txt : what is it?
robots.txt is an instruction file for search engine robots, like googlebot or slurp! This file sets up rules for the search engines as to what they can and cannot index on your site. This is very helpful if you have certain areas of your site that you do not want showing up in search engines, like login pages, ssl protected pages, or anything else you want kept out of the indexes.

How does it work?
The robots.txt file works by specifying a user-agent and then a command for that user-agent.

These are the different variables that can be included in a robots.txt file:
  • User-agent:
  • Disallow:
  • Crawl-Delay:
To indicate comments that are to be ignored, precede the comment line with a # symbol.

User-agent: - This is just the name that the search engine robot uses to identify itself to your server when accessing pages.

Disallow: - This is where you tell the search engine robot what directories or files it is not allowed to spider or index.

Crawl-Delay: - This tells search engine robots how many seconds to wait before moving to the next page. This can help if a robot is causing problems with your server load as they sometimes can.

robots.txt examples

Now that you understand what the terms are, lets see some examples of robots.txt files.

  1. This first example is the most basic and it tells all User-agents that they can spider your entire site:
    Code:
    User-agent: *
    Disallow:
  2. This second example blocks all search engine robots from indexing your cgi-bin directory - a common directory that you would not want indexed.

    Code:
    User-agent: *
    Disallow: /cgi-bin/
  3. This example limits Yahoo Slurp! from the /admin/ directory, but lets others index it:

    Code:
    User-agent: yahoo! Slurp
    Disallow: /admin/

These are simple examples, but you can see how powerful it can be. You can block individual search engine spiders from specific content on your site. As a final, more comprehensive example, I am going to provide an application specific robots.txt file. This robots.txt is written specifically for use with vBulletin forums and keeps the search engine robots out of areas that they do not belong:

Code:
User-agent: *
#Crawl-Delay: 10
Disallow: /forums/ajax.php
Disallow: /forums/ajax_cron.php
Disallow: /forums/attachment.php
Disallow: /forums/checkspelling.php
Disallow: /forums/cron.php
Disallow: /forums/editpost.php
Disallow: /forums/external.php
Disallow: /forums/faq.php
Disallow: /forums/global.php
Disallow: /forums/image.php
Disallow: /forums/joinrequest.php
Disallow: /forums/login.php
Disallow: /forums/misc.php
Disallow: /forums/moderator.php
Disallow: /forums/newattatchment.php
Disallow: /forums/newreply.php
Disallow: /forums/newthread.php
Disallow: /forums/online.php
Disallow: /forums/payment_gateway.php
Disallow: /forums/payments.php
Disallow: /forums/pdfthread.php
Disallow: /forums/poll.php
Disallow: /forums/postings.php
Disallow: /forums/printthread.php
Disallow: /forums/private.php
Disallow: /forums/profile.php
Disallow: /forums/register.php
Disallow: /forums/report.php
Disallow: /forums/reputation.php
Disallow: /forums/search.php
Disallow: /forums/sendmessage.php
Disallow: /forums/subcription.php
Disallow: /forums/subcriptions.php
Disallow: /forums/threadrate.php
Disallow: /forums/usercp.php
Disallow: /forums/usernote.php

Disallow: /forums/admincp/
Disallow: /forums/images/
Disallow: /forums/modcp/
Disallow: /forums/articlebot/
Disallow: /forums/clientscript/
Disallow: /forums/customavatars/
Disallow: /forums/customprofilepics/
Disallow: /forums/files/
Disallow: /forums/install/
Disallow: /forums/cpstyles/
Disallow: /forums/images/
Disallow: /forums/includes/
Disallow: /forums/subscriptions/
Disallow: /forums/attachments/
Disallow: /forums/frm_attach/
Disallow: /forums/vbweather/


__________________
Michael Sasek
AABox Staff
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2  
Old 09-12-2007, 10:48 AM
michael_s's Avatar
AABox Staff
 
Join Date: Dec 2003
Posts: 89
Rep Power: 10
michael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond reputemichael_s has a reputation beyond repute
Default Re: HowTo configure robots.txt properly

Another case that I have been asked about is how to deny all robots from indexing everything on your site. Occasionally there is a need for such a directive. It is very simple:

Code:
User-agent: *
Disallow: /
That tells all robots that everything on your site should not be indexed.
__________________
Michael Sasek
AABox Staff
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

LinkBacks (?)
LinkBack to this Thread: http://forums.aabox.com/howto-tutorials-web-hosting/88-howto-configure-robots-txt-properly.html

Posted By For Type Date
Thay ??i k?t qu? t́m ki?m trên Google??Help me!!! - Di?n ?àn Tin H?c This thread Refback 03-06-2008 06:45 PM
Prevent SE indexing of enitre catalog? - osCommerce and osCMax shopping cart software forums This thread Pingback 02-19-2008 07:11 PM

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -8. The time now is 06:58 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.1.0
Copyright ©2002-2008 AABox Web Hosting