Best Robots.txt For WordPress

As it has been discussed many times before, robots.txt plays a very important role in Search Engine Optimization. Search Engine Robots first look at your robots.txt before crawling your site. WordPress, even though very search-engine friendly with its PermaLinks, seem to be having problems with latest algorithms of search engines. This is caused by what we call a “duplicate page filter”. When you have different URLs pointing to the same content, search engines like Google consider the content as copied and therefore can penalize your whole blog!

For example, the following two urls point to the same content in wordpress:

domain.com/category/category-name/post-name
domain.com/category-name/post-name

to avoid the duplicate content filter working on your wordpress blog, I have come up with this exclusive robots.txt file for you to benefit from search engines.

User-agent: Googlebot
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /sitemap.xml
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/

User-agent: Googlebot-Image
Disallow: /wp-includes/

User-agent: Mediapartners-Google*
Disallow:

User-agent: ia_archiver
Disallow: /

User-agent: duggmirror
Disallow: /

This code will let Google Images index all the files except for the ones in the includes folder, let Google Adsense Bot to visit every page of your blog and make Google bot to ignore other unnecessary duplicate content pages. Adding this robots.txt file will increase your traffic by letting Google pay more attention to your important pages and discard duplicate ones.

About the Author

Turk Hit Box | Other Articles

Hi everyone. My name is Ant, and I am an author at Themeforest. Please don't forget to follow me on Twitter & Dribbble.

Discussion

  1. Dan says:

    Thanks for the robot.txt file, going ot use this on my new blog. =)

  2. Gary says:

    If you're disallowing sitemap.xml, doesn't that hurt the Google search process?  Why not just turn off the Google sitemap plugin?

  3. Turk Hit Box says:

    sitemap.xml has no search value on search engine result pages. It will not generate any traffic and it may cause duplicate content for some pages. However you shouldn't disable it because of Google webmaster tools.  

  4. Bill Petro says:

    What's the final verdict on allowing/dis-allowing sitemap.xml?

  5. eylultoprak says:

    türkçeleştire bilir misin :)eğer türkçe olursa.. sitemizde yayınlamayı düşünüyorum ..tam anlamadımda

  6. Cheers for that, I have updated my robots file :D

  7. Could you explain the reasons for disabling the following?
    Disallow: /index.php
    Disallow: /*?
    Disallow: /*.php$

  8. Turk Hit Box says:

    Sure,
    1- http://www.turkhitbox.co/ and http://www.turkhitbox.com/index.php are same pages, we don't want both of them listed

    2- It removes all urls with a question mark.
    3- It removes all urls that has .php in the middle.

  9. Aha, so the web server doesn't redirect to the index file when asked for a root of a sub-domain, it just returns a contents of the index file. Cheers

  10. Dan O'Neil says:

    Thanks for this useful info – just one quick question…

    If your blog is in a subdirectory of the main site, e.g. http://www.example.com/blog what do you change the lines which start with a */… rather than just / e.g.
    Disallow: */feed/

    Thanks,

    Dan O'Neil
    http://www.aquariuscoaching.co.uk

  11. TK says:

    Why on earth would anyone disallow sitemap.xml the whole point of a sitemap is for search engines to read it.

  12. Thanks for the robots.txt :) very helpful indeed.

    Just one question – I am curious as to why you would disallow the sitemap though. Its not or it shouldn’t duplicate the content and instead be a unique page that has all your website’s menu items enabling the bots to crawl your website more efficiently.

    I am interested in one of your banner spots but can you please send me more info – site stats – unique visits, location of clicks, bounce rates, no. of links leaving the homepage.

  13. justblog says:

    how about it?
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content
    Disallow: /tag
    Disallow: /author
    Disallow: /wget/
    Disallow: /httpd/
    Disallow: /i/
    Disallow: /f/
    Disallow: /t/
    Disallow: /c/
    Disallow: /j/

    User-agent: Mediapartners-Google
    Allow: /

    User-agent: Adsbot-Google
    Allow: /

    User-agent: Googlebot-Image
    Allow: /

    User-agent: Googlebot-Mobile
    Allow: /

    User-agent: ia_archiver-web.archive.org
    Disallow: /

  14. ates says:

    why disallow page ? is this not mean that robot.txt disallow pages on my web site?