After scanning 83000 pages, the scan has stopped and I can't get it to start again. In my browser I get this error:
Quote
The connection to the server was reset while the page was loading.
  The site could be temporarily unavailable or too busy. Try again in a few moments.
  If you are unable to load any pages, check your computer's network connection.
  If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.
I set the memory at 512, and added a one second pause after every request, but I am still not getting it to go past the initial scan.
This website has over 1.5 million URLs.
Re: Very large website, can't get the scan to work past 83,000 pages
« Reply #1 on: April 26, 2012, 05:44:36 PM »
Hello,

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 100-200,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.
Re: Very large website, can't get the scan to work past 83,000 pages
« Reply #2 on: May 06, 2012, 12:38:36 AM »
I have changed the settings to limit the number of urls.  When I tried to restart scanning, nothing happened.  I tried starting from scratch and even reinstalled the program.  Although the crawling page shows that scanning is going on and pages are added to the sitemap, in reality the sitemap is completely blank now.  The only place the pages are showing up is the crawl dump log.