Always need to 'Resume' crawl
« on: July 27, 2009, 07:11:13 PM »
Oleg, please help me figure this out...

Every time I run a crawl, it stops running after a while. I have the configuration set to save the script state every 30 seconds... this enables me to manually 'Resume' from the point where it stopped. That's not such a big problem when I'm manually running the crawl - However it also stops the crawl when I run my scheduled weekly cron job also, and this is where it's a bigger problem.
I've read many posts about people saying that it takes many hours to crawl but that's not my issue... it definitely is stopping - when it stops, I lose the "Transferring data from..." (lower left corner of firefox browser), and I lose the Status bar (lower right corner of browser).

I also get this error every time the cron job runs (this gets emailed to me by godaddy):
Quote
/web/cgi-bin/php5: Symbol `client_errors' has different size in shared object, consider re-linking
Set-Cookie: PHPSESSID=77fpbbomtl4iegge4elaknf5h7; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-type: text/html
What can I do about this error? What does it mean???

Thanks so much!
Re: Always need to 'Resume' crawl
« Reply #1 on: July 28, 2009, 08:36:24 AM »
Hello,

you can resolve the issue by modifying the configuration of PHP on your server, particularly these 2 settings should be increased:
max_execution_time
memory_limit
Re: Always need to 'Resume' crawl
« Reply #2 on: July 29, 2009, 06:55:08 PM »
Wow... this is really a problem for me... I couldn't find the control for these settings in my Hosting Manager so I called GoDaddy. They said that the reason there's no control for those settings is because they don't cap them... with godaddy, those time settings are unlimited. They said to check with my ISP, which I'll try now. Can you please address my question about the error when running cron job...
Quote
/web/cgi-bin/php5: Symbol `client_errors' has different size in shared object, consider re-linking
Set-Cookie: PHPSESSID=77fpbbomtl4iegge4elaknf5h7; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-type: text/html
Re: Always need to 'Resume' crawl
« Reply #3 on: July 29, 2009, 08:03:43 PM »
Oleg,
I just found another of your posts to another user which addresses the problem of the generator freezing. You told him the same thing (change php settings) but in your response to him, you mentioned a php.ini file. I found mine on my server but it doesn't have the lines you mentioned... can you pls tell me what to do with this file? here it is:
Quote
register_globals = off
allow_url_fopen = off

expose_php = Off
max_input_time = 60
variables_order = "EGPCS"
extension_dir = ./
upload_tmp_dir = /tmp
precision = 12
SMTP = relay-hosting.secureserver.net
url_rewriter.tags = "a=href,area=href,frame=src,input=src,form=,fieldset="

[Zend]
zend_extension=/usr/local/zo/ZendExtensionManager.so
zend_extension=/usr/local/zo/4_3/ZendOptimizer.so

Thanks! Oh, still need an answer on the above question about cron error also.
Re: Always need to 'Resume' crawl
« Reply #4 on: July 30, 2009, 07:46:31 AM »
Hello,

in case if those  lines are not found in php.ini you should simply add them (insert at any place of file).

re: cron task

The first line is related to php configuration, but it shouldn't affect running of scripts.