XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
July 23, 2008, 10:08:19 PM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
6867 Posts in 1688 Topics by Members
Latest Member: tonchanluong
Pages: [1]
  Print  
Author Topic: robot text file not excluding generator from indexing  (Read 2594 times)
kiwi2b3
Registered Customer
Newbie
*
Posts: 2


View Profile
« on: February 09, 2008, 05:53:34 AM »

I am trying to use the robot.txt file to exclude certain pages from being indexed by the googlebot.

At the moment I am getting a number of links like this:
/option,com_netinvoice/action,orders/task,order/cid,1/Itemid,170.html
and like this:
/index.php?option=com_content&task=view&id=17&Itemid=1

As there are so many of these in google's index of me  Sad , I am using the disallow command in this format:
Disallow: /*.html
Disallow: /*.php
Disallow: /*itemid
Disallow: /*Itemid
And then the Allow command to allow the 15 or so links that are important.

It works in that the links I want are in my sitemap  Smiley , but the ones I don't want are still there  Huh. How come my Disallow: /*.html didn't stop this:  /option,com_netinvoice/action,orders/task,order/cid,1/Itemid,170.html

Or Disallow: /*Itemid  and Disallow: /*.php didn't stop /index.php?option=com_content&task=view&id=17&Itemid=1

Even though these links that I don't want are in my sitemap, will they be disallowed by the googlebot? And will this idea of disallowing everything with the Disallow: /*.html command and allowing my links through using the Allow command cause me problems in some way?

Any thoughts would be really great  Wink
Logged
admin
Administrator
Hero Member
*****
Posts: 2864


View Profile
« Reply #1 on: February 10, 2008, 01:13:19 AM »

Hello,

not every search engine supports wildcards in robotx.txt, that's why they are still included in sitemap. Google will NOT index those pages even if they are included in sitemap though, since Google supports wildcards and you excluded them in robots.txt (you might want to check it with "Analyze robots.txt" tool in google webmaster account to make sure that it's excluded).
Logged

kiwi2b3
Registered Customer
Newbie
*
Posts: 2


View Profile
« Reply #2 on: February 10, 2008, 04:18:45 AM »

Thanks for the reply, I checked and yes they are excluded by google, even though they are in my sitemap.  Wink
So to get them out of my sitemap, I would put Itemid in my 'do not parse urls' and 'exclude urls' configuration settings, right?
And to confirm: wildcards in the robot.txt file are not supported by xml-sitemaps, and we need to use the config settings, right?
Logged
admin
Administrator
Hero Member
*****
Posts: 2864


View Profile
« Reply #3 on: February 10, 2008, 10:43:33 PM »

Quote
So to get them out of my sitemap, I would put Itemid in my 'do not parse urls' and 'exclude urls' configuration settings, right?
Yes. I t should be added both to "Do not parse" and "Exclude URLs" options though.
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!