Crawl Stops after 500 pages, level 1
« on: March 10, 2008, 06:52:07 PM »
I have followed the setup, copied over the /generator folder to the root of my website.  All file permissions are set, sitemap.xml is created with correct permissions.  I try running the crawl and it goes smooth for the first 400 pages, but then there is no activity in the window.  I left the page open all night, but it always ends up stopping at the following:

Links depth: 2
Current page: 2008/02/10/mobile-world-congress-day-1-sony-ericcson-g700-and-g900/feed
Pages added to sitemap: 406
Pages scanned: 480 (12,999.6 Kb)
Pages left: 100 (+ 1026 queued for the next depth level)
Time passed: 4:57
Time left: 1:01
Memory usage: 1,255.9 Kb

The sitemap is in the root of my website [ External links are visible to forum administrators only ]

These are the settings that I am using:
Starting URL: [ External links are visible to forum administrators only ]
Save Sitemap to: [ External links are visible to forum administrators only ]
Sitemap URL: [ External links are visible to forum administrators only ]

I have set the maximum pages, depth level, and execution time to 0 for unlimited.

Any Suggestions???
Re: Crawl Stops after 500 pages, level 1
« Reply #1 on: March 10, 2008, 07:26:31 PM »
Here is the php info I was able to get from my hosting company

PHP Version 4.4.7

System  Linux infong 2.4 #1 SMP Thu Jan 13 08:59:31 CET 2005 i686 unknown 
Build Date  May 31 2007 15:14:07 
Configure Command  ../configure --with-pear --with-mysql=/usr --with-zlib --enable-debug=no --enable-safe-mode=no --enable-discard-path=no --with-gd --with-png-dir=/usr/lib --enable-track-vars --with-db --with-gdbm --enable-force-cgi-redirect --with-ttf=/usr/ --enable-ftp --with-mcrypt --enable-dbase --enable-memory-limit --enable-calendar --enable-wddx --with-jpeg-dir=/usr/src/kundenserver/jpeg-6b --enable-bcmath --enable-gd-imgstrttf --enable-shmop --enable-mhash --with-mhash=/usr/src/kundenserver/mhash-0.8.9/ --with-openssl --enable-xslt --with-xslt-sablot --with-dom --with-dom-xslt --with-dom-exslt --with-imap --with-curl --with-iconv=/usr/local --with-freetype-dir=/usr/include/freetype2 --with-bz2 --with-gettext --enable-exif --with-idn --enable-mbstring=all --with-sqlite 
Server API  CGI 
Virtual Directory Support  disabled 
Configuration File (php.ini) Path  /usr/local/lib/php.ini 
PHP API  20020918 
PHP Extension  20020429 
Zend Extension  20050606 
Debug Build  no 
Zend Memory Manager  enabled 
Thread Safety  disabled 
Registered PHP Streams  php, http, ftp, https, ftps, compress.bzip2, compress.zlib 
Re: Crawl Stops after 500 pages, level 1
« Reply #2 on: March 10, 2008, 08:13:02 PM »
I have read that the problems often are due to a limit on the execution time and memory limit: I have the following setting set

max_execution_time 50000 50000
memory_limit 40M 40M
Re: Crawl Stops after 500 pages, level 1
« Reply #4 on: March 11, 2008, 01:00:49 AM »
I have set mine to:

max_execution_time = 1000     ; Maximum execution time of each script, in seconds
max_input_time = 2000   ; Maximum amount of time each script may spend parsing request data
;max_input_nesting_level = 64 ; Maximum input variable nesting level
memory_limit = 512M      ; Maximum amount of memory a script may consume (512MB)

And it still hangs up. 
What are the best settings to use?  I have over a million pages to crawl.
Re: Crawl Stops after 500 pages, level 1
« Reply #5 on: March 11, 2008, 01:12:27 AM »
I changed the memory setting and I am able to complete the sitemap generation now after resuming several times.

The problem now is that the sitemap contains the following error

I have the following settings in the php.ini file

memory_limit = 50M
max_execution_time = 3600

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

The following tags were not closed: urlset. Error processing resource '[ External links are visible to forum administrators only ]'.


LE="margin-left:1em;text-indent:-2em">- <url>
  <loc>[ External links are visible to forum administrators only ]</loc>
  <priority>0.32</priority>
  <changefreq>daily</changefreq>
  <lastmod>2008-03-10T20:52:06+00:00</lastmod>
  </url>
Re: Crawl Stops after 500 pages, level 1
« Reply #6 on: March 11, 2008, 01:16:38 AM »
What did you set your max_input_time to?
Re: Crawl Stops after 500 pages, level 1
« Reply #7 on: March 11, 2008, 06:30:37 PM »
I changed my php.ini file to use the same settings that you were using.

Google reports errors in the sitemap now.

Still have to press resume everytime I want to generate the sitemap.

Sitemap errors and warnings
 Line Status Details
 3122 Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.   Found: Mar 11, 2008
Re: Crawl Stops after 500 pages, level 1
« Reply #8 on: March 11, 2008, 11:33:14 PM »
Please try to add this to "Do not parse" and "Exclude URLs" options:
Code: [Select]
returnUrl and regenerate sitemap