What is the user agent for Xml Sitemap
« on: December 04, 2012, 03:46:12 PM »
Hi,

I am trying to run XML Sitemap crawler on our non-production website. But for some reason it doesn't get any results. I think it is because our non-production robots.txt is not allowed to be crawled. I modified the robots.txt file as below but still not able to crawl the site using the XML Sitemap crawler.  What am I doing wrong here?

User-agent: XML-Sitemaps
Disallow:

User-agent: *
Disallow: /
Re: What is the user agent for Xml Sitemap
« Reply #1 on: December 04, 2012, 10:14:43 PM »
Hello,

sitemap generator checks for "User-agent: *" and "User-agent: googlebot" entries in robots.txt.
You can set xs_robotstxt setting in generator/data/generator.conf file to "0" to avoid this.
Re: What is the user agent for Xml Sitemap
« Reply #2 on: December 05, 2012, 06:39:01 PM »
Hi,

I made the suggested changes and the crawler ran fine. But, I am running into another issue. My purpose was to crawl the site to detect all the Canonical URL's on the website. I checked "enable canonical URLs" under Advanced Settings and then ran the crawler. It just gave an xml sitemap file with all the URL's it detected. I dont see a seperate coloumn which lists the canonical URL's detected. Does this tool provide the output of the Canonical Link for all the URL's that it detected?
Re: What is the user agent for Xml Sitemap
« Reply #3 on: December 06, 2012, 12:47:52 PM »
> Does this tool provide the output of the Canonical Link for all the URL's that it detected?

There is no separate field for that. In case if generator find canonical meta tag on your page, it will use that canonical URL instead of the link that is currently crawled.