Out of memory problems
« on: May 25, 2012, 08:30:35 PM »
Hi

I am trying to run this software, but after running for an hour I recieved this error:

Fatal error: Allowed memory size of 268435456 bytes exhausted

At the point I get this error, it is clear from the status reported that there are many many URLs yet to be accessed:

Links depth: 4
Current page: teams/wolverhampton-wanderers/tab/matches/season/1980
Pages added to sitemap: 11939
Pages scanned: 14720 (471,401.7 KB)
Pages left: 36051 (+ 97430 queued for the next depth level)
Time passed: 1:02:13
Time left: 2:32:23
Memory usage: 137,799.7 Kb


I can see from other posts that many users have encountered this problem, and the advice is usually to up the memory limits.  However, given that I have run out of memory with only a small proportion of our large site crawled, I suspect there will not be enough memory on the server to run to completion!

One reason I think this is that despite having crawled over 10000 links, my sitemap.xml file is still empty (perms seem to be correctly set on all files and folders) - implying that the software intends to store many more URLs in memory before committing them to file (although this does seem strange so perhaps I am missing the point...).

Has anyone with a very large site had similar problems, and if so, how were you able to solve them please?

Any comments from xml-sitemaps support would also be very welcome - thanks!

Thanks, Mike






Re: Out of memory problems
« Reply #1 on: May 26, 2012, 03:08:56 PM »
Hello,

sitemap is created only after the site is completely crawled.

please let me know your generator URL/login in private message to check this.
Re: Out of memory problems
« Reply #2 on: May 26, 2012, 10:20:17 PM »
Hi

Currently I'm playing with the memory allocation, and hoping I can fix the problem.

I'll get back to you with details if I can't.

Thanks and cheerio, Mike
Re: Out of memory problems
« Reply #3 on: May 29, 2012, 07:37:06 PM »
Hi again

I'm still having problems generating a sitemap.

After reading other posts, I tried limiting link depth (although ultimately i want a comprehensive sitemap, so don't regard that as a complete solution). 

I managed to get the script to run to completion at a link depth of 5.  Having increased it to 6, I cannot get the script to run to completion, even if I up the memory limits to 1024MB.  It always ends with an out of memory error.

Then I discovered this in the config:

Minimize script memory usage:
use temporary files to store crawling progress


This seemed to suggest that the script could use textfiles to store data, lessening the need for memory.  So I checked it and lowered the memory limits to a more sensible 512 (php.ini) and 256 (sitemap config).

I was expecting this to take much longer, but not to be dependent on momory.  However, the script failed to run to completion just like previously.

This is a little frustrating, as it seems impossible to use this software to generate a complete sitemap for a large data-driven site.  Am I missing something? 

Our dev environment is at: dev.11v11.com

It has no DNS resolution so you'll need to add it to your hosts file to access it.

Username, password and IP to follow by PM.

If it simply isn't possible to use this software on a large site, please just tell me and I'll look for another solution.

Thanks, Mike


Re: Out of memory problems
« Reply #5 on: May 31, 2012, 08:08:15 PM »
Excellent - it seems to have done the trick.  Thanks very much! 

I don't quite understand how that conifg change works, and I think it would be really useful if you guys put a few articles up on the site explaining how to manage/configure your software.

One of the frustrating things about this forum is that so many relevant-looking threads end with "replied to your PM" - so people searching for answers never get to see what the solution was!

Anyway - thanks again.

Cheerio, Mike
Re: Out of memory problems
« Reply #6 on: June 01, 2012, 09:10:41 PM »
Hi

My sitemaps are completing nicely now - thanks!

But they seem to be restarting without my asking them to - once generation has finished, it seems to be starting again without any request from me.

I can't see a setting for this anywhere in the config - am I missing something?

Thanks again, Mike
Re: Out of memory problems
« Reply #7 on: June 01, 2012, 09:18:24 PM »
Also - I don't seem to be able to stop the crawling any more.

It was always a bit flaky (sorry, but it is!) but not even interrupt.log will stop it now...

Cheers, Mike
Re: Out of memory problems
« Reply #8 on: June 02, 2012, 09:00:51 AM »
Hello,

you can disable "automatically resume" setting ingenerator configuration.
Re: Out of memory problems
« Reply #9 on: June 02, 2012, 01:10:05 PM »
Hi

The only "resume" setting I can see on the configuration tab is:

Save the script state, every X seconds:
this option allows to resume crawling operation if it was interrupted. "0" for no saves


This setting seems to be about resumption of an interrupted crawl, rather than stopping the script starting a new crawl when the current crawl finishs, which is the problem I have.

The script crawls again and again and cannot be stopped.

Please advise.

Thanks, Mike
Re: Out of memory problems
« Reply #10 on: June 02, 2012, 02:54:21 PM »
You can set this option in generator/data/generator.conf file to "0":
<option name="xs_autoresume">0</option>
Re: Out of memory problems
« Reply #11 on: June 03, 2012, 12:01:16 PM »
Thanks!  That seems to have done the trick.

cheerio, Mike
Re: Out of memory problems
« Reply #12 on: June 03, 2012, 02:05:25 PM »
Hi again

I spoke too soon.

I set off a background crawl, closed the browser window, and now I can no longer access the generator URL.  If I leave it trying to connect for long enough I eventually get a "connection reset" error message.

so although the scan probably doens't autoresume now, niether can I stop it!

Thanks, Mike
Re: Out of memory problems
« Reply #14 on: June 04, 2012, 09:04:50 PM »
oh - yes, strangely!

I was using FireFox, but Chrome seems to be fine.

Not sure if that means that using Chrome will solve my problems...

Thanks, Mike