• Welcome to Sitemap Generator Forum.
 

Problems with first Crawl

Started by mike59, July 28, 2009, 08:58:51 PM

Previous topic - Next topic

mike59

Hi,

I'm new and just bought the sitemap program.  RIght now I have 16,000 pages and see it will grow about 500 - 1000 per day. Any way, I just the to crawl and it cannot make it through.  It just stops with the following (and it's different every time):

Links depth: 2
Current page: tag/credit/
Pages added to sitemap: 495
Pages scanned: 1620 (60,370.3 KB)
Pages left: 4571 (+ 5817 queued for the next depth level)
Time passed: 1:11:17
Time left: 3:21:08
Memory usage: 4,755.7 Kb

What do I need to do so it can run all the way through?  More memory, more speed, etc?  The problem is that I'm using shared hosting, so I do not have access to any configs.


mike59

That has helped, thanks so much.  I had to run the program 4 times manually, it finally completed.  The problem though is that the crawl didn't obtain all my URLs.  It only indexed 3000 out of 18,000.  What can I do fix this?

XML-Sitemaps Support

Hello,

could you please PM me your generator URL and an example URL that is not included in sitemap and how it can be reached from homepage?

mike59

Sure, I PM'd all the info.  Just curious, is there any way to enable logging to see where the problem lies?  Also, I was curious if your software supports remote crawls.  I was thinking that if I was to create a seperate hosting account just for your software, then crawl my site remotly, then resources (or what ever it is) may be saved for the site itself.  Does your software support this?

XML-Sitemaps Support

Hello,

this issue in most cases is resolved by analyzing the site structure to optimize crawler settings using exclude urls and do not parse options.
Yes, it is possible to crawl the site from remote account, but then resulting sitemap files will have to be manually moved to main server where the site is hosted.

mike59

Would someone be able to recommend a hosting company that doesn't limit php scripts (and also affordable)?  My host say they do not, but I cannot figure out why the script just stops.  I need to be able to build my sitemap.  Does anyone not have an issue?  If so, what host do you use?

mike59

OK, I moved my site to a full dedicated server (outgrew shared hosting).  Now, the software is still crawling my site, but stops every so often.  Is there anyway we can set the software to continue without human intervention?  For example, after two days the software stopped for some reason.  I had to manually kick it off again.  Is there a way I can set it to continue once it detects the script stopped?  So far my first crawl has been running for 90 + hours (and I had to manually restart it twice).

Links depth: 4
Current page: friendship-month-%e2%80%93-the-joys-of-friendship/
Pages added to sitemap: 20847
Pages scanned: 26060 (963,508.8 KB)
Pages left: 5131 (+ 8691 queued for the next depth level)
Time passed: 90:21:10
Time left: 17:47:22
Memory usage: 24,787.5 Kb

XML-Sitemaps Support

Hello,

yes, you can setup a daily scheduled  task (cron job) in hosting control panel for sitemap generator and it will automatically resume generation in case if it has stopped.

mike59

Great news.  So if the job is running fine and the cron job kicks off, would this cause the job to fail since it's already running?  Just checking because I was thinking about having it run every 6 hours or so.....


mike59

ok, well it now completes the crawl (very quickly), but only reports 7956 URLs rather than my 55000 plus URL's.  I ran this several times with the same results.  Have you seen this before?  how can I fix this?


mike59

One of the URL's is

[ External links are visible to forum administrators only ]