Best Robots.txt For WordPress

As it has been discussed many times before, robots.txt plays a very important role in Search Engine Optimization. Search Engine Robots first look at your robots.txt before crawling your site. WordPress, eventhough very search-engine friendly with its PermaLinks, is seem to be having problems with latest algorithms of search engines. This is caused by what we call a “duplicate page filter”. When you have different URLs pointing to the same content, search engines like Google consider the content as copied and therefore can penalize your whole blog!

For example, the following two urls point to the same content in wordpress:

domain.com/category/category-name/post-name
domain.com/category-name/post-name

to avoid the duplicate content filter working on your wordpress blog, I have come up with this exclusive robots.txt file for you to benefit from search engines.

User-agent: Googlebot

Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /sitemap.xml
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/

User-agent: Googlebot-Image
Disallow: /wp-includes/

User-agent: Mediapartners-Google*
Disallow:

User-agent: ia_archiver
Disallow: /

User-agent: duggmirror
Disallow: /

This code will let Google Images index all the files except for the ones in the includes folder, let Google Adsense Bot to visit every page of your blog and make Google bot to ignore other unnecessary duplicate content pages. Adding this robots.txt file will increase your traffic by letting Google pay more attention to your important pages and discard duplicate ones.

16 Comments to “Best Robots.txt For WordPress”

  1. Dan 19 June 2007 at 3:22 am #

    Thanks for the robot.txt file, going ot use this on my new blog. =)

  2. Gary 19 June 2007 at 9:07 pm #

    If you're disallowing sitemap.xml, doesn't that hurt the Google search process?  Why not just turn off the Google sitemap plugin?

  3. Turk Hit Box 19 June 2007 at 10:54 pm #

    sitemap.xml has no search value on search engine result pages. It will not generate any traffic and it may cause duplicate content for some pages. However you shouldn't disable it because of Google webmaster tools.  

  4. Wayne Price 3 July 2007 at 11:53 pm #

    Where do you put the robots.txt file?

  5. Bill Petro 6 July 2007 at 12:10 am #

    What's the final verdict on allowing/dis-allowing sitemap.xml?

  6. eylultoprak 6 July 2007 at 6:37 pm #

    türkçeleştire bilir misin :)eğer türkçe olursa.. sitemizde yayınlamayı düşünüyorum ..tam anlamadımda

  7. [...] Best Robots.txt for wordpress [...]

  8. Kamil Wojcicki 19 December 2007 at 3:46 pm #

    Cheers for that, I have updated my robots file :D

  9. Kamil Wojcicki 20 December 2007 at 9:40 am #

    Could you explain the reasons for disabling the following?
    Disallow: /index.php
    Disallow: /*?
    Disallow: /*.php$

  10. Turk Hit Box 20 December 2007 at 2:12 pm #

    Sure,
    1- http://www.turkhitbox.com/ and http://www.turkhitbox.com/index.php are same pages, we don't want both of them listed

    2- It removes all urls with a question mark.
    3- It removes all urls that has .php in the middle.

  11. Kamil Wojcicki 20 December 2007 at 3:43 pm #

    Aha, so the web server doesn't redirect to the index file when asked for a root of a sub-domain, it just returns a contents of the index file. Cheers

  12. Dan O'Neil 8 January 2008 at 7:29 pm #

    Thanks for this useful info – just one quick question…

    If your blog is in a subdirectory of the main site, e.g. http://www.example.com/blog what do you change the lines which start with a */… rather than just / e.g.
    Disallow: */feed/

    Thanks,

    Dan O'Neil
    http://www.aquariuscoaching.co.uk

  13. TK 17 October 2009 at 3:29 am #

    Why on earth would anyone disallow sitemap.xml the whole point of a sitemap is for search engines to read it.

  14. Cayman Web Design 26 March 2010 at 4:30 am #

    Thanks for the robots.txt :) very helpful indeed.

    Just one question – I am curious as to why you would disallow the sitemap though. Its not or it shouldn’t duplicate the content and instead be a unique page that has all your website’s menu items enabling the bots to crawl your website more efficiently.

    I am interested in one of your banner spots but can you please send me more info – site stats – unique visits, location of clicks, bounce rates, no. of links leaving the homepage.

  15. justblog 6 April 2010 at 12:41 pm #

    how about it?
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content
    Disallow: /tag
    Disallow: /author
    Disallow: /wget/
    Disallow: /httpd/
    Disallow: /i/
    Disallow: /f/
    Disallow: /t/
    Disallow: /c/
    Disallow: /j/

    User-agent: Mediapartners-Google
    Allow: /

    User-agent: Adsbot-Google
    Allow: /

    User-agent: Googlebot-Image
    Allow: /

    User-agent: Googlebot-Mobile
    Allow: /

    User-agent: ia_archiver-web.archive.org
    Disallow: /

  16. ates 4 July 2010 at 10:50 pm #

    why disallow page ? is this not mean that robot.txt disallow pages on my web site?


Leave a Reply