html Sitemap stops pulling the title after a while
« on: December 18, 2006, 07:55:07 AM »
I'm not sure what is causing this, but take a look at the generated html sitemap here [ External links are visible to forum administrators only ]

Everything goes great, then suddenly the page title isn't used anymore. I can't figure out what is causing it. At first I thought it was because of a HTML 4.0 transitional page, but I excluded and no parsed the URL and I still get the problem.

Any ideas? Everything else seems to be working great.
Bad regulation is worse than no regulation
Re: html Sitemap stops pulling the title after a while
« Reply #1 on: December 18, 2006, 08:06:30 PM »
Hello,

if you have some of the URLs defined in "Do not parse" option, they will have no title extracted, since pages are not fetched from the site.
Re: html Sitemap stops pulling the title after a while
« Reply #2 on: December 19, 2006, 05:46:48 AM »
That makes sense. However, the contents of my do not parse is identical to the contents of my excluded URLs. So unparsed URLs should not be in the final sitemap.  ???

I think I've found the problem, though. I had set crawl depth to 4. and that last level really flies through. Now I've set my crawl depth to 5 with the following result:

51 new URLs found - which are getting the title problem described above, and
the ones that did have a title problem are now fine.

So the problem seems to come from the way the last set level is dealt with in the crawl.

This experiment also means I need to rethink a couple of settings, not least of which is the depth of some of the pages!!
Bad regulation is worse than no regulation
Re: html Sitemap stops pulling the title after a while
« Reply #3 on: December 19, 2006, 08:37:05 PM »
I ended up solving the problem by creating seperate xml sitemaps for .html pages and for non-.html pages. Trying to put it all into one sitemap wasn't solving the problem no matter how deep I crawled. Maybe a vBulletin thing  ???
Bad regulation is worse than no regulation
Re: html Sitemap stops pulling the title after a while
« Reply #4 on: December 19, 2006, 10:53:30 PM »
Hello Dave,

in case if you have crawling depth limited, all pages from the last depth level are NOT fetched as well for better performance.