Takes too long time
« on: February 19, 2009, 02:15:53 AM »
Hello,

my site has like 1.500.000 pages and it will take forever to crawl them all.

Links depth: 3
Current page: bo-jackson-baseball-pc-spiel-gm1960-bilder.html
Pages added to sitemap: 5939
Pages scanned: 5940 (297,198.0 KB)
Pages left: 82195 (+ 36280 queued for the next depth level)
Time passed: 49:54
Time left: 690:42
Memory usage: 79,813.7 Kb

Can you explain me how to make it faster?
My Server is really strong, 8 Cores 8 GB RAM but the script runs really really slow .-(

I cant pm the admin for help, so please can some1 help me how to configure it to make it faster?
Re: Takes too long time
« Reply #1 on: February 19, 2009, 10:56:31 PM »
Hello,

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 200-300,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.
Re: Takes too long time
« Reply #2 on: May 19, 2017, 02:52:42 AM »
I am getting 130000 pages in 23 minus. Used to having 130000 pages in days.
Hint was in Narrow Indexed Pages Set. Try Exclusion preset. I am now using this software without a hassle.