Memory Error
« on: March 29, 2009, 09:13:01 AM »
We just tried to run the xml-sitemap generator for our site.  We currently have about 25,000 pages right now but will have approximately 5,000,000 shortly.  We have two questions... (1) will the sitemap generator handle millions of pages and if so, how? and (2) We got a memory error message on the initial run and would like to know how to fix it:

Links depth: 5
Current page: selectBusinessDetails.php?id=30403
Pages added to sitemap: 12040
Pages scanned: 12040 (61,394.1 KB)
Pages left: 1448 (+ 8421 queued for the next depth level)
Time passed: 36:27
Time left: 4:23
Memory usage: 17,420.4 Kb
Resuming the last session (last updated: 2009-03-29 00:49:59)
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 5715212 bytes) in /home/tinkncom/public_html/generator/pages/ : eval()'d code on line 6
Re: Memory Error
« Reply #1 on: March 29, 2009, 08:10:23 PM »

to resolve the memory issue you should increase memory_limit setting in your PHP configuration (php.ini on your server).

With website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 200-300,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.