Hello,

my site has around 3000+ pages.
When we launch the crawling script, it stops at 150-200 pages.

We have tried to increase memory limit.
Webhoster says its limited to 70M - is this not enough ?

We have tried to increase max_execution_time settings
Webhoster says CPU execution is limited to 40seconds

We also tried CRON JOB
Webhoster says 1 CRON JOB is limited to 6 minutes

The webhoster mentioned that if i need more than the limits above they would make me an offer for a managed server. Now, that would mean, in order to run the XML sitemap script i would have to pay 100$ per month more for webhosting - that doesn't make sense, am sure there is a way to solve this.

Can you advise of what we could do to have the crawling run without interruptions ?
I thought of having the Cron Job activate the crawling script every week, when crawler is activated it should simply complete its job. Any special settings maybe that can make it work smoothly ?

Your answers to solve this will be highly appreciated, hope to hear from you soon, bye bye.
« Last Edit: May 19, 2010, 10:14:27 AM by info1304 »
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #1 on: May 19, 2010, 12:39:46 PM »
Hello,

please try to setup a cron task for sitemap generator - 6 minutes should be enough to create sitemap for your site.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #2 on: May 19, 2010, 12:58:12 PM »
Hi Oleg,

thanks for your response, but as mentioned above, i already tried it with cron job without success. You mentioned here that for 1000 pages it can take 16minutes, well i have above 3000+ so its most likely going to take above an hour: https://www.xml-sitemaps.com/forum/index.php/topic,4012.msg14437.html#msg14437

What other solution do you propose ?
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #3 on: May 20, 2010, 06:08:00 AM »
I just launched 13 cron jobs yesterday for 2 hours, each 10mins. No success, nothing was generated.
I then tried to open /runcrawl.php directly in my browser to see if process gets started, there it says:
"This tool can be executed in command line mode only"

Does this mean that it won't work with cron jobs?

Please respond.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #4 on: May 20, 2010, 07:54:35 AM »
Well my webhoster says there is no PHP-CLI installed for shared webserver, i would need to get my own server. I mean that wouldn't make sense, just in order to have a sitemap script running properly to get a server...
I have other scripts that run well with cron jobs.

What can you suggest to get this working ?
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #6 on: May 20, 2010, 11:11:52 AM »
Hi Oleg,

we are trying out this:
https://www.xml-sitemaps.com/forum/index.php/topic,3467.msg12811.html#msg12811

maybe this will solve our problem. But i still need to set up around 13 cron jobs, each on around 20mins intervals.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #7 on: May 20, 2010, 12:18:44 PM »
Hi Oleg, sent you a PM to have a look.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #8 on: May 20, 2010, 03:09:53 PM »
Hi Oleg,

This scenario happen to me last week where i hosted the websites at hostgator. The problem I'm facing is similar to yours which is the background process can't exceed 90 seconds.

What i've tried is that create a batch job to clear the temporary crawling data file every 4th, 9th, 14th,19th ... 59th minutes every hour. And then I've created a cron job to run every 5 munites to crawl the a site for 3 hours. which pretty much not so practical at all.

At last I put away the shared hosting at hostgator and go for a VPS server which I have more control of the hosting. Now the cron jobs runs very smooth.

For your information I have 35+ website (*expected to be 300+ websites within 2 months) with each of the website have average of 15,000 pages. Pretty much the VPS can handle the requests without any problem even though I run the cron job everyday (*now it's running once a week each website to give room for more websites).

Hope this help.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #9 on: May 21, 2010, 06:37:38 AM »
Hi,

we have also tried with like 20 cron jobs each set on 5 mins... But that has been a timely issue, of testing, trial and error. At the moment it wouldn't make sense for us yet to get a fully managed server. So what we will try now is: Sitemap Generator v4.0 and i hope this solves the problem. Sitemap generation isn't that simple after all - I can understand now why Google appreciates it so much when we the the structure of a big website is presented to them on a plate... ;)

Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #10 on: May 21, 2010, 08:46:07 AM »
Bad News: we are having this problem with version 4.0:
https://www.xml-sitemaps.com/forum/index.php/board,3.html

I hope you can help us out Oleg, we are more than a week on this trying to make it work...
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #11 on: May 21, 2010, 09:34:01 AM »
Hi,

we have also tried with like 20 cron jobs each set on 5 mins... But that has been a timely issue, of testing, trial and error. At the moment it wouldn't make sense for us yet to get a fully managed server. So what we will try now is: Sitemap Generator v4.0 and i hope this solves the problem. Sitemap generation isn't that simple after all - I can understand now why Google appreciates it so much when we the the structure of a big website is presented to them on a plate... ;)



Fully manage server is costy, but for VPS you can get it around USD 25 /month which is kinda okay is you like full control of your hosting. No limtation on memory, cpu usage and also background process.
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #12 on: May 21, 2010, 09:38:08 AM »
Hi, but VPS is not the right solution for me as I do not have the time to control my server - i have so many databases and scripts, cmses etc. running on my webhosting package. I'm sure there is a way to make this script work without the need to get an extra server :)

So back to my problem:

@Oleg:

Bad News: we are having this problem with version 4.0:
https://www.xml-sitemaps.com/forum/index.php/board,3.html

I hope you can help us out Oleg, we are more than a week on this trying to make it work...
Re: Site not being completely crawled, facing stops at 150-200 pages
« Reply #13 on: May 21, 2010, 04:32:05 PM »
Up! This issue is still pending, response required. Pls help.