XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
July 04, 2008, 11:11:32 AM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
6612 Posts in 1632 Topics by Members
Latest Member: RKS
Pages: [1]
  Print  
Author Topic: Robots.txt  (Read 11452 times)
mark1
Registered Customer
Newbie
*
Posts: 7


View Profile
« on: November 22, 2005, 05:46:24 PM »

I was under the impression that the standalone version did honor the robots.txt file; however, it is crawling pages that have been disallowed in the robots.txt file in the root of my localhost server on an iMac G5, 10.3. I know it is finding it as I am not seeing any errors in my Apache error log.

Any ideas?
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #1 on: November 22, 2005, 05:50:29 PM »

Hello,

please post the contents of your robots.txt and example URL that matches disallow directive but is still included into sitemap.
Logged

mark1
Registered Customer
Newbie
*
Posts: 7


View Profile
« Reply #2 on: November 22, 2005, 06:30:07 PM »

Robot.txt
Code:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/large/
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=product_reviews_write
Disallow: /index.php?main_page=redirect
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=tell_a_friend
Disallow: /index.php?main_page=create_account
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=images
Disallow: /index.php?main_page=ask_a_question
Disallow: /index.php?main_page=product_reviews
Disallow: /index.php?main_page=address_book
Disallow: /index.php?main_page=account_notifications

Sitemap.xml
Code:
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=login</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=logoff</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=shopping_cart</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #3 on: November 22, 2005, 10:46:57 PM »

Hi,

robots.txt file resides at your domain root (i.e., http://bob.local/robots.txt)
Having your robots.txt file contents, it disallows the urls like http://bob.local/index.php?main_page=login and NOT http://bob.local/harborfare/index.php?main_page=login
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!