Hi,

I've been getting pressure from my ISP for high resource usage by crawling bots, so I changed my robots.txt file to this:

Sitemap: [ External links are visible to forum administrators only ]
Sitemap: [ External links are visible to forum administrators only ]

User-agent: *
Allow: /sitemap.xml
Allow: /sitemap.xml.gz
Allow: /index.php
Allow: /index.html
Allow: /index.htm
Disallow: /

(I'm also running another sitemap application)

I am hoping this works better in decreasing all of the crawling of unneeded areas of my site. However, after making this change, I tried to run XML-Sitemaps again and it produced an error and did not index any pages. I have attached a screen capture of the error.

Can changing the robots.txt file affect the Sitemap application? Or is it something else I am overlooking?

I did just download a new, fresh version of XML-Sitemap in case it was a result of something I had set wrong, but that did not change the error results.

Thank you in advance for your help!
Re: made robots.txt and other changes, now Sitemaps not working
« Reply #1 on: July 31, 2011, 10:17:31 AM »
Hello,

the "Disallow: /" line tells generator (and other bots) that it's NOT allowed to index pages on your site. You can set xs_robotstxt setting in generator/data/generator.conf file to "0" to disable checking of robots.txt, but keeping that directive might negatively affect indexiing of your site in search engines.
Re: made robots.txt and other changes, now Sitemaps not working
« Reply #2 on: August 12, 2011, 08:26:58 AM »
Thanks for your help.

I have modified my robots.txt file to disallow all user-agents except those i want to index the site (Google, MSN, etc.). I made the change to the xs_robotstxt setting. Now the Sitemap generator is crawling the site again, but it is producing a string of these errors:

Warning: preg_match() [function.preg-match]: Compilation failed: nothing to repeat at offset 72 in /home/trite7/public_html/generator/pages/class.grab.inc.php(2) : eval()'d code on line 181

It is still indexing pages, but this error is repeated over and over again where it normally shows the pages being indexed.

I have also made some changes in the configuration settings to exclude certain URL strings. Could this be the cause? Maybe I am excluding something I shouldn't?
Re: made robots.txt and other changes, now Sitemaps not working
« Reply #3 on: August 13, 2011, 09:26:33 AM »
Hello,

the warning message might be related to incorrectly defined "Exclude URLs" setting, what exactly you have defined there?
Re: made robots.txt and other changes, now Sitemaps not working
« Reply #4 on: August 27, 2011, 01:34:03 AM »
1. get rid of the robots.txt - They are a sure way of advertising your wares in public.

2. use robots index noindex meta tags in each file. If you don't want it found, do it directly.

If you tell robots.txt not to look at your generator directory, you are effectively telling would-be thieves that you have a 'paid for' program running and where it is.

Eliminate the robots.txt and eliminate your problem.