XML Sitemaps Generator

Author Topic: Robots.txt  (Read 38450 times)

mark1

  • Registered Customer
  • Approved member
  • *
  • Posts: 7
Robots.txt
« on: November 22, 2005, 05:46:24 PM »
I was under the impression that the standalone version did honor the robots.txt file; however, it is crawling pages that have been disallowed in the robots.txt file in the root of my localhost server on an iMac G5, 10.3. I know it is finding it as I am not seeing any errors in my Apache error log.

Any ideas?

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Robots.txt
« Reply #1 on: November 22, 2005, 05:50:29 PM »
Hello,

please post the contents of your robots.txt and example URL that matches disallow directive but is still included into sitemap.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

mark1

  • Registered Customer
  • Approved member
  • *
  • Posts: 7
Re: Robots.txt
« Reply #2 on: November 22, 2005, 06:30:07 PM »
Robot.txt
Code: [external links are visible to admins only]
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/large/
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=product_reviews_write
Disallow: /index.php?main_page=redirect
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=tell_a_friend
Disallow: /index.php?main_page=create_account
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=images
Disallow: /index.php?main_page=ask_a_question
Disallow: /index.php?main_page=product_reviews
Disallow: /index.php?main_page=address_book
Disallow: /index.php?main_page=account_notifications

Sitemap.xml
Code: [external links are visible to admins only]
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=login</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=logoff</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=shopping_cart</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Robots.txt
« Reply #3 on: November 22, 2005, 10:46:57 PM »
Hi,

robots.txt file resides at your domain root (i.e., http://bob.local/robots.txt)
Having your robots.txt file contents, it disallows the urls like http://bob.local/index.php?main_page=login and NOT http://bob.local/harborfare/index.php?main_page=login
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

 

SMF 2.0.12 | SMF © 2014, Simple Machines
XHTML RSS WAP2