Sitemap crawling never stops/ends
« on: July 15, 2006, 11:35:39 AM »
Hallo

I have crawling my website every day for one week and each time I have added maximum paged with 2000.

Everytime the website pages are expanding and growing and now I have:

25259 pages
Processing time: 28445,05 s
Pages size: 2,181.91Mb
And now the staus is:
006-07-09 08:56:28, URLs added: 25295, estimated URLs left: 92495)

Why does it newer end?

Best regards
Staffan /Sweden
Re: Sitemap crawling never stops/ends
« Reply #1 on: July 15, 2006, 11:46:24 PM »
Hello,

do you mean that the crawling stops at this point?
If so, your server limits the maximum script execution time and/or memory for the scripts and you modify your php configuration:
increase memory_limit and max_execution_time settings in php.ini and restart apache.

also discussed here: https://www.xml-sitemaps.com/forum/index.php/topic,318.html
Re: Sitemap crawling never stops/ends
« Reply #2 on: July 16, 2006, 10:22:41 PM »
Hallo again

Now I have crawled one domain for 2 days. I have put the configuration to "unlimited time" and save every 3600 sec. Everytime it stops and are not finished and everytime there is still 25000 more pages to crawl. It looks like it will never end. Is it some kind of loop or what?

Best regards
Staffan
Re: Sitemap crawling never stops/ends
« Reply #3 on: July 17, 2006, 11:59:16 PM »
Hello Staffan,

the number of pages depends on your site fully, do you mean that there is much less than 25000 URLs at your site?
Did you increased the memory_limit and max_execution_time settings in php.ini ? (the script will not work for unlimited time in case if server limits it).
Re: Sitemap crawling never stops/ends
« Reply #4 on: August 03, 2006, 11:42:18 PM »
Hallo again

Yes, we increaded the memory limits and the time. We have about 10.000 URL:s on the site. We have tried to let it crawled on the smallest part that is about 2000 URL:s. It took 3 days before the limits of 512 Mb was ending the program.

It seem it will only crawl round and round.

What shall I do?

Best regards
Staffan
Re: Sitemap crawling never stops/ends
« Reply #5 on: August 04, 2006, 12:13:41 AM »
Hello Staffan,

please let me know your generator instance URL via private message.
Re: Sitemap crawling never stops/ends
« Reply #6 on: August 04, 2006, 06:07:12 PM »
Hallo

My computer guy says: "There is no time limit since its being executed from CLI PHP, the memory limit is set at 512MB and is reached after approx 3 days of crawling"

What do you want to know in a private message? (is that your e-mail address?)

/Staffan
Re: Sitemap crawling never stops/ends
« Reply #7 on: August 05, 2006, 10:31:20 PM »
Hello,

it definitely should not take 3 days to crawl 2000 URLs and this amount of pages should not take 512Mb of memory, that's why I want to check your configuration. I need the URL of your sitemap generator instance so that I can see its config in browser.

you can send me a private message here: https://www.xml-sitemaps.com/forum/index.php?action=pm;sa=send;u=1
Re: Sitemap crawling never stops/ends
« Reply #8 on: October 04, 2006, 03:04:02 PM »
I am having what seems to be the same issue.  Crawler runs and runs - even if I tell it to timeout after 90 seconds... but nothing is getting saved.
:-(
Re: Sitemap crawling never stops/ends
« Reply #10 on: June 17, 2007, 11:47:42 AM »
I was having the same problem with my sitemap not completing i.e. stopping near the end

I set an interrupt of 60 secs per 1000 requests and it now completes successully 12,000+ pages.

Steve
Re: Sitemap crawling never stops/ends
« Reply #11 on: August 20, 2007, 11:48:08 PM »
Suggestion:
Docs say "When the generator script is running (either with cron or using "Run in background" feature), you will see it's progress state on "Crawling" page. There you will also find the link to stop the script, which is very useful for big sites because you don't have to wait until it is finished if you want to modify the configuration and re-run the script."

That STOP link should be prominent on every script page, regardless of mode = cron, background, or foreground.
Right now, I've canceled a background run, restarted manually, and see no such STOP link.
Re: Sitemap crawling never stops/ends
« Reply #13 on: November 19, 2007, 03:27:09 PM »
I have the same problem. Crawling goes on forever and there's no stop button.
Re: Sitemap crawling never stops/ends
« Reply #14 on: January 04, 2008, 12:22:57 AM »
You can manually upload empty "interrupt.log" file into data/ folder to stop the generator.