Robots.txt

mark1

7

« on: November 22, 2005, 05:46:24 PM »

I was under the impression that the standalone version did honor the robots.txt file; however, it is crawling pages that have been disallowed in the robots.txt file in the root of my localhost server on an iMac G5, 10.3. I know it is finding it as I am not seeing any errors in my Apache error log.

Any ideas?

Logged

XML-Sitemaps Support

11749

Re: Robots.txt

« Reply #1 on: November 22, 2005, 05:50:29 PM »

Hello,

please post the contents of your robots.txt and example URL that matches disallow directive but is still included into sitemap.

Logged

Oleg Ignatiuk
https://www.xml-sitemaps.com
Send me a Private Message

SEM and SEO Reports, more than 45M domains: The world's leading Competitive Intelligence Tool for digital marketing.

mark1

7

Re: Robots.txt

« Reply #2 on: November 22, 2005, 06:30:07 PM »

Robot.txt

Code: [Select]

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/large/
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=product_reviews_write
Disallow: /index.php?main_page=redirect
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=tell_a_friend
Disallow: /index.php?main_page=create_account
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=images
Disallow: /index.php?main_page=ask_a_question
Disallow: /index.php?main_page=product_reviews
Disallow: /index.php?main_page=address_book
Disallow: /index.php?main_page=account_notifications

Sitemap.xml

Code: [Select]

<url>
  <loc>http://bob.local/harborfare/index.php?main_page=login</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=logoff</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>
<url>
  <loc>http://bob.local/harborfare/index.php?main_page=shopping_cart</loc>
  <priority>0.5</priority>
  <lastmod>2005-11-22T02:46:19+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>

Logged

XML-Sitemaps Support

11749

Re: Robots.txt

« Reply #3 on: November 22, 2005, 10:46:57 PM »

Hi,

robots.txt file resides at your domain root (i.e., http://bob.local/robots.txt)
Having your robots.txt file contents, it disallows the urls like http://bob.local/index.php?main_page=login and NOT http://bob.local/harborfare/index.php?main_page=login

Logged

Oleg Ignatiuk
https://www.xml-sitemaps.com
Send me a Private Message

SEM and SEO Reports, more than 45M domains: The world's leading Competitive Intelligence Tool for digital marketing.