Page appears to stop crawling
« on: June 12, 2008, 02:13:05 AM »
Hi,

I've been trying to get the site I'm working on to crawl correctly however it keeps stopping on a certain page. I've tried using the tools on your site also: the headers return 200 correctly, however the SE tool returns nothing for Google and Yahoo but works correctly for MSN.
The link to the page stopping progress is: http://www.workboot.co.nz/browse/category/Architectural_Designers/189.html. This comes off this page: http://www.workboot.co.nz/browse/index.html which works correctly.

Any ideas? Any help would be immensely appreciated!
Re: Page appears to stop crawling
« Reply #2 on: June 13, 2008, 12:10:47 AM »
Hi,

It appears to work for MSNBot but now Google. Why would this be?

Thanks for your help.
Re: Page appears to stop crawling
« Reply #3 on: June 13, 2008, 11:38:57 PM »
Perhaps your server blocks requests from Google IPs.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.
Re: Page appears to stop crawling
« Reply #4 on: June 16, 2008, 05:51:11 AM »
Hi,

I don't think that is the case as http://www.workboot.co.nz/browse/index.html crawls correctly.
I suspect that this page is the reason that the sitemap generator does not generate a full sitemap.
What algorithm do you use for the sitemap crawler to extract links? Is it a regular expression? I can modify the page to suit however I simply need to find out what to change it to. It is currently XHTML compliant http://validator.w3.org/check?uri=http%3A%2F%2Fwww.workboot.co.nz%2Fbrowse%2Fcategory%2FArchitectural_Designers%2F189.html&charset=(detect+automatically)&doctype=Inline&group=0&verbose=1 so the HTML is valid.
Any further ideas?
Re: Page appears to stop crawling
« Reply #5 on: June 16, 2008, 05:18:22 PM »
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.
Re: Page appears to stop crawling
« Reply #6 on: June 18, 2008, 11:24:39 PM »
Sorry; I don't think you quite understand:

http://workboot.co.nz/browse/index.html works perfectly. The SE bot simulator works on this page with all bot types, and so does the sitemap generator.

http://workboot.co.nz/browse/category/Automotive_Repairs/222.html or similar does not work correctly. This doesn't work with the SE bot simulator for Google and Yahoo bots however DOES work for the MSN bot. It also does not work correctly with the Sitemap generation tool; that is no pages below that depth are added to the sitemap.
What I don't understand is what is different between pages. What is different between the Google bot and the MSN bot? The page is XHTML compliant so what is it on the page that is stopping the Google bot from finding any links?

Thanks.
Re: Page appears to stop crawling
« Reply #7 on: June 19, 2008, 02:51:16 AM »
Hello,

that page doesn't load completely when googlebot is specified for user-agent. You can try that with "user agent switched" firefox add-on and opening the page in browser identified as googlebot.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.
Re: Page appears to stop crawling
« Reply #8 on: June 19, 2008, 03:25:56 AM »
Thanks for that!
Turned out to be the exact issue because of a bug in the ASP.NET framework for the new Google user agent string.
Re: Page appears to stop crawling
« Reply #9 on: June 19, 2008, 04:41:58 PM »
You are welcome!
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.