Sitemap cannot find broken links
« on: September 25, 2012, 06:39:14 AM »
I have sitemap installed on a site and it is not finding broken links (with 404 page). I have removed some content and so naturally i want to go in on several pages and edit links that may be broken.

Problem is the sitemap is not finding them.  I can still them on page after looking for a few of them and ran the sitemap again and it still cannot find them.

This would mean this tool can possibly miss broken links and webmasters would never know as the tool cannot pick them up.

Any reason for this?
Re: Sitemap cannot find broken links
« Reply #1 on: September 25, 2012, 07:20:00 AM »
Update :  When i remove those broken links - i can get another 3 to pop up after regenerating again.

It appears its only allowing a certain number of broken links to show on each crawl. What i have noticed is only 3 broken links - then i must fix them and recrawl again and get another 3.

Does this have anything to do with - Maximum referred pages to store (for broken links list):

I mean its set at 2 - so i assume it only allows 2 pages to show for broken links.?

ON this site since i removed whole directories I know for sure we have maybe 40-50 broken links and I want to be able to catch them all in one crawl

Is this possible?

Why would this tool only pick up 2 or 3 at a time - you have to crawl your site 15-20 times to pick up all broken links.. dont get it.
Re: Sitemap cannot find broken links
« Reply #2 on: September 25, 2012, 07:34:44 AM »
now its only picking up 2 broken links at a time.. not sure why it switched from 3 to 2 - so it takes me 3-4 minutes each time to recraw find a couple of broken links and then recrawl again.

Is this normal?
Re: Sitemap cannot find broken links
« Reply #3 on: September 25, 2012, 08:28:18 AM »
SO as of now i have several dozen broken links on site - and the generator cannot find them. It indicates I have zero. It just stopped picking them up... I dont get it.
Re: Sitemap cannot find broken links
« Reply #4 on: September 25, 2012, 09:10:45 AM »
i figured it out...

robots.txt

I blocked several /directories/ in robots.txt and of course this also blocked sitemap.

Is there a way to unblock only sitemap bot?

Or do i need to specify in robots.txt that I only want google ot to block those directories?
Re: Sitemap cannot find broken links
« Reply #5 on: September 25, 2012, 01:08:04 PM »
Hello,

you can set xs_robotstxt setting to "0" in generator/data/generator.conf file.
Re: Sitemap cannot find broken links
« Reply #6 on: September 25, 2012, 02:33:42 PM »
what will that do? is it advantagous? needed? whats the purpose?
Re: Sitemap cannot find broken links
« Reply #7 on: September 25, 2012, 02:56:17 PM »
SO i did this - and its just like not having a robots.txt - but if you have directories blocked it will add them in sitemap.

So in my case i need to turn off robots.txt and then turn it back on each time.

Also it still only shows 10-12 broken links at a time -wont do all of them in one go.
Re: Sitemap cannot find broken links
« Reply #8 on: September 26, 2012, 09:04:23 AM »
> what will that do? is it advantagous? needed? whats the purpose?

That will tell generator to ignore robots.txt. You can use this option or make sure that pages are not blocked in robots.txt. Generator checks entries for "User-agent: *" and "User-agent: googlebot".