crawling stops after few thousands pages
« on: September 21, 2009, 09:42:55 AM »
Everything seems to work, but when I start the crawling it's stops sometimes it's after 3976 pages, sometimes before (1500 pages) cannot locate the problem.
Re: crawling stops after few thousands pages
« Reply #1 on: September 21, 2009, 09:57:47 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: crawling stops after few thousands pages
« Reply #2 on: September 22, 2009, 12:23:29 PM »
I too am having this problema nd contacted my host. This is what they said to me:
"I have uploaded a php.ini to your account, and within that file, set your memory limit to 64M. I cannot, however, update the script timeout limit, as we have a server-side limit of 60seconds in place, and changing it at the account-level won't override it."

I continue to have the problem of the crawl stopping after some seconds and then must rerun the crawl.  I've been doing this for a few days and it is a problem since it is hard to constantly remember to restart the crawl.

I have also changed my settings for delay after so many requests and that has not fixed anything unless I'm not usign the correct values. Is there any suggestion on what I can do in the configuration to keep the crawl running till completion?

Thank you
Rex
Re: crawling stops after few thousands pages
« Reply #4 on: September 28, 2009, 11:15:44 PM »
I PM'd my URL a week ago and have never heard back from anyone.  Did you guys get it?
Re: crawling stops after few thousands pages
« Reply #5 on: September 29, 2009, 10:52:46 PM »
I have noticed numerous questions on this site about conditions where the Crawler stops after a few minutes. A thousand pages or may only a few hundred pages gets indexed.  In many of these occasions you tell people to increase the memory limit and increase the script time out limit.

I have told you that my host gave me this statement:

"I have uploaded a php.ini to your account, and within that file, set your memory limit to 64M. I cannot, however, update the script timeout limit, as we have a server-side limit of 60seconds in place, and changing it at the account-level won't override it."

You requested that I send my URL to you as a private message. I did this over a week ago and have never heard back from anyone nor has anyone replied to my questions within this thread.

So, I have to keep restarting the crawl every 5-6 minutes and this has been going on for over two days since it is impossible to keep doing this on a continuous!  Now, it appears that the crawl has started a new Sitemap and is now overwriting 2 days of work or it is parsing the sitemap into more than one file.  There seems to be no way of determining.  Do you have any help for people in this situation?  This has become unbearable and I'll have to abandon this effort without some help.
Re: crawling stops after few thousands pages
« Reply #6 on: September 30, 2009, 10:05:33 PM »
I've replied to your PM the same day (a week ago)< please check your inbox.
Re: crawling stops after few thousands pages
« Reply #7 on: October 08, 2009, 01:33:07 PM »
I've asked for help but you keep telling me to change the Timeout.  As I have previously stated we are unable to change the timeout or the memory limit since we are on a hosted site and they won't allow anything beyond 64M and 60 seconds. 

All we can do is resume the crawl but the crawl only runs for about 3 minutes. That will be the end of adding pages to the sitemap.  For some reason it won't allow us to resume for another 9 minutes.  When we say Run it refreshes the screen and shows the exact same number of pages in the sitemap. We have to keep clicking on Run and over for 9 minutes until it shows us the screen where you can click the button for saying do not interrupt and the button to say resume the last session.  Then we get another 3 minutes of adding pages and another 9 minutes of it hanging until it gives us the page with the resume button.  It took about 3 days of resuming to finally get a sitemap.  Extremely frustrating to do this when we are in the midst of making continual changes to the web site.

I think you owe it to potential users of this product to warn them that if they are on a hosted site with a limited amount of memory and timeout that this will be the resultant problem.

Isn't there any way to keep it crawling even though we are unable to adjust the timeout?
Re: crawling stops after few thousands pages
« Reply #8 on: October 08, 2009, 09:39:59 PM »
Hello,

in case if your server doesn't allow to run the script for longer time, it cannot run on it's own.
One of the options is to install this add-on for firefox: https://addons.mozilla.org/en-US/firefox/addon/115
then open this page in browser: domain.com/generator/index.php?op=crawlproc&resume=1
and select auto-reloading of the page to avoid refreshing it. When sitemap is created it will be auto-redirected to "view sitemap" page.
Re: crawling stops after few thousands pages
« Reply #9 on: October 10, 2009, 01:52:07 AM »
I got real excited because it kept crawling past 3 minutes.  but, unfortunately, it stopped after a total of 5 minutes.  I have the reload of the web page set for 10 seconds.  I was able to restart the crawl by manually reloading the page. But, once again, must continue to restart the crawl manually.  I was hoping the have a method where the crawl will continue without interruption and a manual reload until it is finished.

I will try different reload times and see if that will help.  I'll let you know.
Re: crawling stops after few thousands pages
« Reply #10 on: October 10, 2009, 09:57:47 AM »
Hello,

reload time should be *larger* than your page timeout, i.e. if crawling stops after 3 minutes you should try reloading every 5 minutes, so that it resumes generator after it stops.
Re: crawling stops after few thousands pages
« Reply #11 on: October 10, 2009, 08:14:23 PM »
OMG, it worked.  I put a reload of 30 seconds in and it went through the entire crawl without stopping.

Thank you so very much!