Crawl seems block after 100 pages scanned
« on: January 20, 2007, 04:29:48 AM »
Links depth: 1
Current page: Bijoux et Cadeaux,Montres,z1,334.html
Pages added to sitemap: 96
Pages scanned: 100 (2,686.1 Kb)
Pages left: 198 (+ 1330 queued for the next depth level)
Time passed: 8:55
Time left: 17:40
Memory usage: 923.1 Kb

the website contains more than 50000 pages, and xmlsitemap script ever block at same point... memory_limit is set to 128MB, and max_execution_time to 1200... no firewall (not active)... all permissions 666 to files... please help
Re: Crawl seems block after 100 pages scanned
« Reply #1 on: January 21, 2007, 12:38:44 AM »
Hello,

please try increasing max_execution_time setting (1200 is only 20 minutes and you will need more for this). Also, we suggest to execute sitemap generator via command line if you have SSH access for better performance.
Re: Crawl seems block after 100 pages scanned
« Reply #2 on: January 21, 2007, 05:56:36 AM »
tried to pass max_execution_time setting to 12000 but same thing... and crawl always launched via command line on the server (the server is a dedicated machine home hosted)
Re: Crawl seems block after 100 pages scanned
« Reply #3 on: January 22, 2007, 11:37:27 PM »
What do you see when execute it via command line? (command line generator should not display progress like you specified in the first post, it shows the progress details line-by-line).
Re: Crawl seems block after 100 pages scanned
« Reply #4 on: January 25, 2007, 12:18:01 PM »
in fact I launch the script via command line, and check the progress with firefox...
so this is what appears into my shell with command line :

<html>
<head>
<title>XML Sitemaps - Generation</title>
<meta http-equiv="Content-type" content="text/html;charset=iso-8859-15" />
<link rel=stylesheet type="text/css" href="pages/style.css">
</head>
<body>
Resuming the last session (last updated: 1969-12-31 19:00:00)1 | 297 | 57.8 | 0:34 | 171:25 | 1 | 719.6 Kb | 1 | 0 | 719
20 | 278 | 551.0 | 2:54 | 40:31 | 1 | 667.1 Kb | 18 | 309 | -52
40 | 258 | 1,081.7 | 4:46 | 30:48 | 1 | 704.1 Kb | 36 | 577 | 37
60 | 238 | 1,641.2 | 6:39 | 26:22 | 1 | 768.1 Kb | 56 | 863 | 64
80 | 218 | 2,204.2 | 8:54 | 24:17 | 1 | 808.5 Kb | 76 | 1139 | 40
100 | 198 | 2,683.4 | 10:44 | 21:16 | 1 | 925.9 Kb | 96 | 1347 | 117

and there is the progress with firefox:

Already in progress. Current process state is displayed:
Links depth: 1
Current page: Bijoux et Cadeaux,Montres,z1,334.html
Pages added to sitemap: 96
Pages scanned: 100 (2,683.4 Kb)
Pages left: 198 (+ 1347 queued for the next depth level)
Time passed: 10:44
Time left: 21:16
Memory usage: 925.9 Kb

crawl always block at the same point, and no sitemap is created
Re: Crawl seems block after 100 pages scanned
« Reply #5 on: January 25, 2007, 03:15:50 PM »
And what do you see in the shell when script stops? No output at all or there is some sort of error message?
Re: Crawl seems block after 100 pages scanned
« Reply #6 on: January 25, 2007, 11:18:21 PM »
it stops refresh with the line 100 | 198 | 2,683.4 | 10:44 | 21:16 | 1 | 925.9 Kb | 96 | 1347 | 117

no message, no other line... like the script continue to work but it does not... and after a few minutes if I check with firefox the crawl page the page is like script has never been launched and there is only the "run" button
Re: Crawl seems block after 100 pages scanned
« Reply #7 on: January 26, 2007, 10:52:12 PM »
Can you provide us with temporary ssh access (via private message) to check this further?
Re: Crawl seems block after 100 pages scanned
« Reply #8 on: January 27, 2007, 10:03:06 PM »
I just tried to launch the script (on the same machine, same config) to crawl one of my other website and the script seems to work perfectly  :o
so what could block it when it crawls this one? not the firewall I desengage it when I use the script
Re: Crawl seems block after 100 pages scanned
« Reply #9 on: January 28, 2007, 12:27:58 AM »
Mm.. this is strange (since it has partially crawled the site). Does our online generator (https://www.xml-sitemaps.com/) crawl more than 100 pages fro that site?