Doesn't complete the crawl
« on: March 07, 2008, 02:20:00 PM »
Hi,

I have a site with around 200,000 pages. Sitemap crawler only crawls around 5000 and then stops or times out.

Is there any workaround for large sites?
Re: Doesn't complete the crawl
« Reply #1 on: March 07, 2008, 11:09:43 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: Doesn't complete the crawl
« Reply #2 on: March 08, 2008, 01:40:15 AM »
Hi Oleg,

I have increased all three resource variables but it would still stop after 500 products.

Any other ideas?
Re: Doesn't complete the crawl
« Reply #3 on: March 08, 2008, 02:32:42 AM »
OK . I tried running the runcrawl.php via PuTTy SSH client and it indexed around 15,000 products and then gave following error:

PHP Fatal error:  Allowed memory size of 67108864 bytes exhausted (tried to allocate 14580497 bytes) in /var/www/vhosts/ZZZZ.com/subdomains/comparison/httpdocs/generator/pages/class.utils.inc.php(2) : eval()'d code on line 6
PHP Warning:  Unknown: open(/var/lib/php/session/sess_d4633d7ba9cdb38103129a155e390286, O_RDWR) failed: Permission denied (13) in Unknown on line 0
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php/session) in Unknown on line 0
Re: Doesn't complete the crawl
« Reply #4 on: March 09, 2008, 08:56:47 PM »
I see that you increased it to 64M and it crawled more pages now, please increase it again.
Re: Doesn't complete the crawl
« Reply #5 on: March 10, 2008, 02:56:08 PM »
I’m not real familiar with SSH.  Can you explain how I can access your software with SSH.  My site is currently hosted.
Re: Doesn't complete the crawl
« Reply #6 on: March 10, 2008, 11:06:29 PM »
I mean if you have SSH access to your server (not to sitemap generator). In this case you can execute it in command line with:
/usr/local/bin/php /path/to/generator/runcrawl.php