crawl stops after 944 pages
« on: July 30, 2005, 02:51:01 PM »
Hi,

Am trying to crawl my site but the crawl seems to stop at between 900 and 1000 pages with 2000 pages shown as queued.

The status bar in IE (green progress bar) suddenly completes and the counting of pages crawled stops. No sitemap is created.

I've got round this by starting the crawl from a sub directory for now but it means I'm around 2000 pages short in my sitemap.

Any ideas.

Cheers, John
Re: crawl stops after 944 pages
« Reply #1 on: July 30, 2005, 03:05:10 PM »
Hi John,

the script probably timed out. Please make sure you php settings allow set_time_limit() function (that is used to extend the running time) or increase the maximum_execution_time settings in php.ini.
If this is not the case, please PM me the URL of your generator instance.
Re: crawl stops after 944 pages
« Reply #2 on: August 01, 2005, 12:44:09 AM »
Is there a workaround for this if you havn't got access to the PHP config files which I havn't. The host generally won't change this sort of setting either.

I've just tried a crawl using my laptop (which has foxserv installed) and it seemed to do it ok so would assume it is a host server config as suggested.

Cheers, John
Re: crawl stops after 944 pages
« Reply #3 on: August 01, 2005, 10:53:24 AM »
Hi John,

probably the next generator version will introduce the "resume crawling" feature (not sure about the release date yet), but the workaround for the moment is to generate sitemap at your laptop (like you did) and upload it to your host.
Re: crawl stops after 944 pages
« Reply #4 on: March 05, 2007, 11:31:57 PM »
i wunder which number should i type into max_execution_time, since its default set as 30 ? ???
Re: crawl stops after 944 pages
« Reply #5 on: March 06, 2007, 10:11:34 PM »
It depends on how many pages your site has and how fast the pages are loaded from your server. For instance, if your site has 1,000 pages and every page load time is about 1 seconds, total generation time will be at least 1000 seconds. If it only takes 0.1 second to load page from your site, total time will be only 100 seconds.
Re: Crawl becomes unstable at 7,000 pages, fails at 13714
« Reply #6 on: March 10, 2007, 06:22:16 PM »
I had a lot of trouble getting from level 3 to level 4. With save time set at 180 the crawl would crash before it had a chance to save. I finally set the time to 45 and it was able to get past to level 4. Then things were reasonably smooth except it would crash every 800 pages or so, but would continue from the last save. But when it got to 13714 (current save), I could not get it to stay alive long enough to make a new save. I tried 45s, 30s, 20s and 10s for save intervals but in each case it would crash within 200 pages and not have made a save file. Then I tried 180s. It ran for about 800 pages and then crashed (with no new save record).

Is there a threshold related to memory use? When I was trying to get from level 3 to level 4 things got rough when memory was about 15,300 Kb. After going to level 4 the memory dropped back to 12,000 Kb or so, but slowly built up to the 15,000 Kb level. That's where I hit a wall. Splat.

GLM


 
Re: crawl stops after 944 pages
« Reply #7 on: March 10, 2007, 11:20:09 PM »
Hello,

it looks like yuo should increase your memory_limit setting in php configuration to resolve that.
Re: crawl stops after 944 pages
« Reply #8 on: March 11, 2007, 03:02:50 PM »
Hello admin,

I am having a similar issue. The crawl seems to stop after it has scanned 420 pages.
See below:
 Links depth: 3 
Current page: Akal%20Takht.shtml
Pages added to sitemap: 396
Pages scanned: 420 (11,775.2 Kb)
Pages left: 357 (+ 958 queued for the next depth level)
Time passed: 2:06
Time left: 1:47
Memory usage: 1,045.2 Kb

Is there anything else I should be doing?
Re: crawl stops after 944 pages
« Reply #9 on: March 11, 2007, 10:24:27 PM »
Hello,

the same suggestion should help to resolve the issue in your case (increasing max_execution_time and memory_limit settings in php configuration).
Re: crawl stops after 944 pages
« Reply #10 on: March 28, 2007, 03:45:51 PM »
Doubling the php memory allocation enabled enabled the process to complete. It did crash every 400 records or so but was able to recover from saves (every 30s). Initially it ran for several thousand records so I don't think these crashes are related to php time outs. Any ideas?

While I was able to get through (about 26,000 records, 12 levels) it took a lot of nursing and I'd be hard pressed to do it again if there isn't a better way. The end result was fine and the broken links list is a nice addition. Hope there is a solution.

GLM