• Welcome to Sitemap Generator Forum.
 

Robots.txt

Started by mark1, November 22, 2005, 05:46:24 PM

Previous topic - Next topic

mark1

I was under the impression that the standalone version did honor the robots.txt file; however, it is crawling pages that have been disallowed in the robots.txt file in the root of my localhost server on an iMac G5, 10.3. I know it is finding it as I am not seeing any errors in my Apache error log.

Any ideas?

XML-Sitemaps Support

Hello,

please post the contents of your robots.txt and example URL that matches disallow directive but is still included into sitemap.

mark1

Robot.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/large/
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=product_reviews_write
Disallow: /index.php?main_page=redirect
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=tell_a_friend
Disallow: /index.php?main_page=create_account
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=images
Disallow: /index.php?main_page=ask_a_question
Disallow: /index.php?main_page=product_reviews
Disallow: /index.php?main_page=address_book
Disallow: /index.php?main_page=account_notifications


Sitemap.xml
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=login</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=logoff</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=shopping_cart</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>

XML-Sitemaps Support

Hi,

robots.txt file resides at your domain root (i.e., [ External links are visible to logged in users only ])
Having your robots.txt file contents, it disallows the urls like [ External links are visible to logged in users only ] and NOT [ External links are visible to logged in users only ]