I am trying to use the robot.txt file to exclude certain pages from being indexed by the googlebot.
At the moment I am getting a number of links like this:
and like this:
As there are so many of these in google's index of me
, I am using the disallow command in this format:
And then the Allow command to allow the 15 or so links that are important.
It works in that the links I want are in my sitemap
, but the ones I don't want are still there
. How come my Disallow: /*.html didn't stop this: /option,com_netinvoice/action,orders/task,order/cid,1/Itemid,170.html
Or Disallow: /*Itemid and Disallow: /*.php didn't stop /index.php
Even though these links that I don't want are in my sitemap, will they be disallowed by the googlebot? And will this idea of disallowing everything with the Disallow: /*.html command and allowing my links through using the Allow command cause me problems in some way?
Any thoughts would be really great