Home > SEO > Robots Exclusion Protocol: robots.txt

Robots Exclusion Protocol: robots.txt

This post was most recently updated on December 28th, 2016

What is Robots Exclusion Protocol?

Robots Exclusion Protocol is a convention where directives are written with an objective to restrict or channel web crawlers to access parts of website. A part of website may be publicly visible and rest is private for all or some web crawlers.

The standard was proposed by Martijn Koster.

The robots.txt file need to be in root directory of your site.

 

 Directives

  • User-agent: user agent directive is used to specify the robots/ web crawlers that can access the allowed URLs. A “*” means allowed/ disallowed URLs for all robots. List of user agents for specific robots can be found here.
  • Allow/ Disallow: the directives are used to specify allowed or disallowed URLs of a website respectively.
  • Crawl-delay: crawl delay directive is supported by major crawlers, set to number of seconds to wait between successive requests to the same server
  • Sitemap: sitemap directive specifies a path to sitemap.xml and is recognized by some crawlers.
  • Host: host directive is supported by some crawlers allowing websites with multiple mirrors to specify their preferred domain

 

Robots Exclusion Protocoln can also be applied using meta tags and X-Robots tag in HTTP header.

A “noindex” meta tag:

A “noindex” HTTP response header:

 

The X-Robots tag is only effective when the page is requested and meta tag is effective when the page has loaded, on the other hand the robots.txt is effective before the page has loaded.

If  crawler received a no-indexing directive from robots.txt it will ignore the header X-Robots tag and meta tags.

 

This Article is TAGGED in , , , , . BOOKMARK THE permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">