Sitemap Generator Stop crawling
« on: November 17, 2007, 05:01:42 AM »
Hello,

I have installed Standalone Sitemap Generator (PHP) v2.7, 2007-10-21.  The system starts crawling and I can see the number of pages.  After crawling 371 pages the system stops crawling....  The program did find however that there are over 20,000 pages to crawl....  What can I do?  Ayy idea?

site: [ External links are visible to forum administrators only ]

Thanks,
Frederic
Re: Sitemap Generator Stop crawling
« Reply #1 on: November 17, 2007, 09:28:13 PM »
Get used to it. It took me 4 days of restarting the crawl every 3 minutes before it completed a sweep of my 40,000 pages.
Re: Sitemap Generator Stop crawling
« Reply #2 on: November 18, 2007, 01:30:22 AM »
That is NOT good, I have over 100,000 pages.  It always seem to stop at the same place..... :o
This idea is good but it just does not seem to work..... Any one knows a program that works well to do this  ;D?
Re: Sitemap Generator Stop crawling
« Reply #3 on: November 18, 2007, 06:39:07 PM »
I have the same problem.. i have a 100k+ page site as well.

This app seems to lock up pretty quickly... and I've made all kinds of changes by going through these posts..

I'm a bit dissapointed,, not sad, mad... just dissapointed in this application... i have several other site spidering applications that just motor through my site with out timimg out etc... so it's just got to be this App..

Again the inteface looks great every thing except it just can't index very well from my observation.. If the moderator is reading please review settings here,,, remember I can spider my sites with other applications.. it's this one that is just locking up..
 :)
Re: Sitemap Generator Stop crawling
« Reply #4 on: November 18, 2007, 09:40:49 PM »
Hello,

the timeout problem can be resolved by increasing max_execution_time and memory_limit settings in PHP configuration on your host. Please take into account that Sitemap Generator is not a desktop application and is running on your server side, as a result certain limitations defined in server configuration may be applied to it.
Re: Sitemap Generator Stop crawling
« Reply #5 on: February 03, 2008, 03:04:17 PM »
Hello,

I think that Apache config also affects the performance of this script.

I used to have the Apache [ External links are visible to forum administrators only ] value set to 30 seconds and that is far too low for running this script with big sites. It works fine with few web pages. I have increased the timeout value to 300 and I have been able to crawl over 1,000 pages. I need to crawl some 50 K pages, so I will play around with the timeout value until I achieve this.

Nevertheless, I think setting up a high timeout value is not very good for Apache performance. Moreover, many hosting providers are not willing to change this settings, unless you have a dedicated server. Maybe the XML Sitemaps script could avoid this time-out issues by forcing some kind of reload. I know it is possible to force browser reloads... Maybe there is also some solution when you use the crontab options.

Kind regards,

Elena
Re: Sitemap Generator Stop crawling
« Reply #6 on: February 03, 2008, 07:59:25 PM »
Hello,

with a cron task it may take a bit more total time for sitemap generation, but that usually resolves the script timeout problem.
Re: Sitemap Generator Stop crawling
« Reply #7 on: February 04, 2008, 03:25:38 PM »
Hello,

You are right, with the cron job it seems to work perfectly even if I leave the timeout value very low (30 seconds).

Thank you for your help,

Elena
Re: Sitemap Generator Stop crawling
« Reply #8 on: February 14, 2008, 02:22:24 PM »
The last version of this script I got to work properly was 2.3. I've tried all the newer ones on different server setups with no joy, they all hang sooner or later. So I've stuck with 2.3.