Dont open bookmark "Crawling" in generator
« on: December 06, 2010, 01:22:56 PM »
I started generator 2 days ago, max execution time is over, but bookmark "Crawling" still dont open. Help please. Data of ftp I have allready sent you according this topic,4587.html . Generator is situated here [ External links are visible to forum administrators only ]
Re: Dont open bookmark "Crawling" in generator
« Reply #1 on: December 06, 2010, 07:16:55 PM »
What characteristics should have a dedicated server to scan 400,000 pages?
Re: Dont open bookmark "Crawling" in generator
« Reply #2 on: December 06, 2010, 09:49:24 PM »

in case if generator interface is not opening for you, I'd recommend to try just closing and reopening the browser (sometimes it keeps open connection with site and doesn't open new one until previous is closed).

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 100-150,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.