• Welcome to Sitemap Generator Forum.
 

Sitemap Generator Stop crawling

Started by schvarzy, November 17, 2007, 05:01:42 AM

Previous topic - Next topic

schvarzy

Hello,

I have installed Standalone Sitemap Generator (PHP) v2.7, 2007-10-21.  The system starts crawling and I can see the number of pages.  After crawling 371 pages the system stops crawling....  The program did find however that there are over 20,000 pages to crawl....  What can I do?  Ayy idea?

site: [ External links are visible to forum administrators only ]

Thanks,
Frederic

nanovation

Get used to it. It took me 4 days of restarting the crawl every 3 minutes before it completed a sweep of my 40,000 pages.

schvarzy

That is NOT good, I have over 100,000 pages.  It always seem to stop at the same place..... :o
This idea is good but it just does not seem to work..... Any one knows a program that works well to do this  ;D?

berniedri

I have the same problem.. i have a 100k+ page site as well.

This app seems to lock up pretty quickly... and I've made all kinds of changes by going through these posts..

I'm a bit dissapointed,, not sad, mad... just dissapointed in this application... i have several other site spidering applications that just motor through my site with out timimg out etc... so it's just got to be this App..

Again the inteface looks great every thing except it just can't index very well from my observation.. If the moderator is reading please review settings here,,, remember I can spider my sites with other applications.. it's this one that is just locking up..
:)

XML-Sitemaps Support

Hello,

the timeout problem can be resolved by increasing max_execution_time and memory_limit settings in PHP configuration on your host. Please take into account that Sitemap Generator is not a desktop application and is running on your server side, as a result certain limitations defined in server configuration may be applied to it.

balearweb

Hello,

I think that Apache config also affects the performance of this script.

I used to have the Apache [ External links are visible to forum administrators only ] value set to 30 seconds and that is far too low for running this script with big sites. It works fine with few web pages. I have increased the timeout value to 300 and I have been able to crawl over 1,000 pages. I need to crawl some 50 K pages, so I will play around with the timeout value until I achieve this.

Nevertheless, I think setting up a high timeout value is not very good for Apache performance. Moreover, many hosting providers are not willing to change this settings, unless you have a dedicated server. Maybe the XML Sitemaps script could avoid this time-out issues by forcing some kind of reload. I know it is possible to force browser reloads... Maybe there is also some solution when you use the crontab options.

Kind regards,

Elena

XML-Sitemaps Support

Hello,

with a cron task it may take a bit more total time for sitemap generation, but that usually resolves the script timeout problem.

balearweb

Hello,

You are right, with the cron job it seems to work perfectly even if I leave the timeout value very low (30 seconds).

Thank you for your help,

Elena

DaveB

The last version of this script I got to work properly was 2.3. I've tried all the newer ones on different server setups with no joy, they all hang sooner or later. So I've stuck with 2.3.