As it has been discussed many times before, robots.txt plays a very important role in Search Engine Optimization. Search Engine Robots first look at your robots.txt before crawling your site. WordPress, even though very search-engine friendly with its PermaLinks, seem to be having problems with latest algorithms of search engines. This is caused by what we call a “duplicate page filter”. When you have different URLs pointing to the same content, search engines like Google consider the content as copied and therefore can penalize your whole blog!
For example, the following two urls point to the same content in wordpress:
to avoid the duplicate content filter working on your wordpress blog, I have come up with this exclusive robots.txt file for you to benefit from search engines.
User-agent: Googlebot Disallow: /wp-content/ Disallow: /trackback/ Disallow: /wp-admin/ Disallow: /feed/ Disallow: /archives/ Disallow: /sitemap.xml Disallow: /index.php Disallow: /*? Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: */feed/ Disallow: */trackback/ Disallow: /page/ Disallow: /tag/ Disallow: /category/ User-agent: Googlebot-Image Disallow: /wp-includes/ User-agent: Mediapartners-Google* Disallow: User-agent: ia_archiver Disallow: / User-agent: duggmirror Disallow: /
This code will let Google Images index all the files except for the ones in the includes folder, let Google Adsense Bot to visit every page of your blog and make Google bot to ignore other unnecessary duplicate content pages. Adding this robots.txt file will increase your traffic by letting Google pay more attention to your important pages and discard duplicate ones.