Max exec time for grab.inc.php exceeded - can't crawl site
« on: October 27, 2009, 11:09:56 AM »
I've been battling this script for a week now, trying to get it to work with memory limitations. I don't have a problem with server limitations (yet), but it's been tremendously frustrating to try to work with a script that can't respect the limits it is configured to work within. When the script is allocated 100 MB of RAM to use, it should surely be able to operate within it (or at least default to some chunking code). This may be implemented in the browser-triggered crawl, but cron-based crawl has consistently failed out due to an over-memory error (the script itself exits, stating it is asking for more the config says it should).

However, now I have managed to get it to run! But it still dies. (Last line of command-line output + error (domain removed for privacy):

91020 | 19328 | 2,740,018.8 | 6:08:04 | 1:18:09 | 4 | 104,145.5 Kb | 89009 | 26185 | 15

Fatal error: Maximum execution time of 9000 seconds exceeded in /home/webadmin/HOSTNAME/html/generator/pages/class.grab.inc.php(2) : eval()'d code on line 406
 
What's going wrong, and can you help get this site crawled?
Re: Max exec time for grab.inc.php exceeded - can't crawl site
« Reply #1 on: October 27, 2009, 04:14:27 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: Max exec time for grab.inc.php exceeded - can't crawl site
« Reply #2 on: October 28, 2009, 11:29:07 AM »
Hmm – turns out that was the max time configuration in the script's config. Apparently some part of the script obeys it when executing from the command line.

New problem, though:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 52635936 bytes) in /home/webadmin/targetvacations.ca/html/generator/pages/class.utils.inc.php(2) : eval()'d code on line 29

This comes when I configure a maximum memory amount for the script (in this case, specifically 256MB). Should the script not be able to handle this situation and chunk the data appropriately? Having seen this message after crawling is complete regardless of what amount of memory I allow it to use (first 32MB, then 100MB, now 256MB), my suspicion is that part of the script that is checking to see how much it should use, then trying to behave appropriately is failing.

Keep in mind I'm running this from command line, *not* the web interface. The last line from the crawl script before it dumped out its array of pages was:
111080 | 4896 | 3,351,548.8 | 8:28:30 | 0:22:24 | 5 | 128,379.9 Kb | 108911 | 80527 | 69

Any help you can offer would be greatly appreciated – any thoughts as to how to get this to finally work and, perhaps more importantly, how to resume after it has failed in this way (as I am getting somewhat tired of having to wait 8+ hours to see whether the script will work).

Thanks.
Re: Max exec time for grab.inc.php exceeded - can't crawl site
« Reply #3 on: October 28, 2009, 08:59:27 PM »
Hello,

you should increase memory_limit setting in this case. The script needs to keep a full list of URLs that were already crawled to avoid crawling them again, creating endless loops.