Page appears to stop crawling
« on: June 12, 2008, 02:13:05 AM »
Hi,

I've been trying to get the site I'm working on to crawl correctly however it keeps stopping on a certain page. I've tried using the tools on your site also: the headers return 200 correctly, however the SE tool returns nothing for Google and Yahoo but works correctly for MSN.
The link to the page stopping progress is: [ External links are visible to forum administrators only ]. This comes off this page: [ External links are visible to forum administrators only ] which works correctly.

Any ideas? Any help would be immensely appreciated!
Re: Page appears to stop crawling
« Reply #2 on: June 13, 2008, 12:10:47 AM »
Hi,

It appears to work for MSNBot but now Google. Why would this be?

Thanks for your help.
Re: Page appears to stop crawling
« Reply #4 on: June 16, 2008, 05:51:11 AM »
Hi,

I don't think that is the case as [ External links are visible to forum administrators only ] crawls correctly.
I suspect that this page is the reason that the sitemap generator does not generate a full sitemap.
What algorithm do you use for the sitemap crawler to extract links? Is it a regular expression? I can modify the page to suit however I simply need to find out what to change it to. It is currently XHTML compliant [ External links are visible to forum administrators only ] so the HTML is valid.
Any further ideas?
Re: Page appears to stop crawling
« Reply #6 on: June 18, 2008, 11:24:39 PM »
Sorry; I don't think you quite understand:

[ External links are visible to forum administrators only ] works perfectly. The SE bot simulator works on this page with all bot types, and so does the sitemap generator.

[ External links are visible to forum administrators only ] or similar does not work correctly. This doesn't work with the SE bot simulator for Google and Yahoo bots however DOES work for the MSN bot. It also does not work correctly with the Sitemap generation tool; that is no pages below that depth are added to the sitemap.
What I don't understand is what is different between pages. What is different between the Google bot and the MSN bot? The page is XHTML compliant so what is it on the page that is stopping the Google bot from finding any links?

Thanks.
Re: Page appears to stop crawling
« Reply #7 on: June 19, 2008, 02:51:16 AM »
Hello,

that page doesn't load completely when googlebot is specified for user-agent. You can try that with "user agent switched" firefox add-on and opening the page in browser identified as googlebot.
Re: Page appears to stop crawling
« Reply #8 on: June 19, 2008, 03:25:56 AM »
Thanks for that!
Turned out to be the exact issue because of a bug in the ASP.NET framework for the new Google user agent string.