• Welcome to Sitemap Generator Forum.
 

Crawl Stops after 500 pages, level 1

Started by nick.marshall, March 10, 2008, 06:52:07 PM

Previous topic - Next topic

nick.marshall

I have followed the setup, copied over the /generator folder to the root of my website.  All file permissions are set, sitemap.xml is created with correct permissions.  I try running the crawl and it goes smooth for the first 400 pages, but then there is no activity in the window.  I left the page open all night, but it always ends up stopping at the following:

Links depth: 2
Current page: 2008/02/10/mobile-world-congress-day-1-sony-ericcson-g700-and-g900/feed
Pages added to sitemap: 406
Pages scanned: 480 (12,999.6 Kb)
Pages left: 100 (+ 1026 queued for the next depth level)
Time passed: 4:57
Time left: 1:01
Memory usage: 1,255.9 Kb

The sitemap is in the root of my website [ External links are visible to forum administrators only ]

These are the settings that I am using:
Starting URL: [ External links are visible to forum administrators only ]
Save Sitemap to: [ External links are visible to forum administrators only ]
Sitemap URL: [ External links are visible to forum administrators only ]

I have set the maximum pages, depth level, and execution time to 0 for unlimited.

Any Suggestions???

nick.marshall

Here is the php info I was able to get from my hosting company

PHP Version 4.4.7

System  Linux infong 2.4 #1 SMP Thu Jan 13 08:59:31 CET 2005 i686 unknown 
Build Date  May 31 2007 15:14:07 
Configure Command  ../configure --with-pear --with-mysql=/usr --with-zlib --enable-debug=no --enable-safe-mode=no --enable-discard-path=no --with-gd --with-png-dir=/usr/lib --enable-track-vars --with-db --with-gdbm --enable-force-cgi-redirect --with-ttf=/usr/ --enable-ftp --with-mcrypt --enable-dbase --enable-memory-limit --enable-calendar --enable-wddx --with-jpeg-dir=/usr/src/kundenserver/jpeg-6b --enable-bcmath --enable-gd-imgstrttf --enable-shmop --enable-mhash --with-mhash=/usr/src/kundenserver/mhash-0.8.9/ --with-openssl --enable-xslt --with-xslt-sablot --with-dom --with-dom-xslt --with-dom-exslt --with-imap --with-curl --with-iconv=/usr/local --with-freetype-dir=/usr/include/freetype2 --with-bz2 --with-gettext --enable-exif --with-idn --enable-mbstring=all --with-sqlite 
Server API  CGI 
Virtual Directory Support  disabled 
Configuration File (php.ini) Path  /usr/local/lib/php.ini 
PHP API  20020918 
PHP Extension  20020429 
Zend Extension  20050606 
Debug Build  no 
Zend Memory Manager  enabled 
Thread Safety  disabled 
Registered PHP Streams  php, http, ftp, https, ftps, compress.bzip2, compress.zlib 

nick.marshall

I have read that the problems often are due to a limit on the execution time and memory limit: I have the following setting set

max_execution_time 50000 50000
memory_limit 40M 40M


da_lyman

I have set mine to:

max_execution_time = 1000     ; Maximum execution time of each script, in seconds
max_input_time = 2000   ; Maximum amount of time each script may spend parsing request data
;max_input_nesting_level = 64 ; Maximum input variable nesting level
memory_limit = 512M      ; Maximum amount of memory a script may consume (512MB)

And it still hangs up. 
What are the best settings to use?  I have over a million pages to crawl.

nick.marshall

I changed the memory setting and I am able to complete the sitemap generation now after resuming several times.

The problem now is that the sitemap contains the following error

I have the following settings in the php.ini file

memory_limit = 50M
max_execution_time = 3600

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

The following tags were not closed: urlset. Error processing resource '[ External links are visible to forum administrators only ];.


LE="margin-left:1em;text-indent:-2em">- <url>
  <loc>[ External links are visible to forum administrators only ]</loc>
  <priority>0.32</priority>
  <changefreq>daily</changefreq>
  <lastmod>2008-03-10T20:52:06+00:00</lastmod>
  </url>

da_lyman

What did you set your max_input_time to?

nick.marshall

I changed my php.ini file to use the same settings that you were using.

Google reports errors in the sitemap now.

Still have to press resume everytime I want to generate the sitemap.

Sitemap errors and warnings
Line Status Details
3122 Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.   Found: Mar 11, 2008