Very large website, can't get the scan to work past 83,000 pages

sites1

2

Very large website, can't get the scan to work past 83,000 pages

« on: April 26, 2012, 12:53:55 PM »

After scanning 83000 pages, the scan has stopped and I can't get it to start again. In my browser I get this error:

Quote

The connection to the server was reset while the page was loading.
The site could be temporarily unavailable or too busy. Try again in a few moments.
If you are unable to load any pages, check your computer's network connection.
If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.

I set the memory at 512, and added a one second pause after every request, but I am still not getting it to go past the initial scan.
This website has over 1.5 million URLs.

Logged

XML-Sitemaps Support

11795

Re: Very large website, can't get the scan to work past 83,000 pages

« Reply #1 on: April 26, 2012, 05:44:36 PM »

Hello,

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 100-200,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.

Logged

Oleg Ignatiuk
https://www.xml-sitemaps.com
Send me a Private Message

SEM and SEO Reports, more than 45M domains: The world's leading Competitive Intelligence Tool for digital marketing.

sites1

2

Re: Very large website, can't get the scan to work past 83,000 pages

« Reply #2 on: May 06, 2012, 12:38:36 AM »

I have changed the settings to limit the number of urls. When I tried to restart scanning, nothing happened. I tried starting from scratch and even reinstalled the program. Although the crawling page shows that scanning is going on and pages are added to the sitemap, in reality the sitemap is completely blank now. The only place the pages are showing up is the crawl dump log.

Logged

XML-Sitemaps Support

11795

Re: Very large website, can't get the scan to work past 83,000 pages

« Reply #3 on: May 07, 2012, 02:08:49 PM »

Hello,

sitemap remains empty until crawling is finished, at that point sitemap is created.

Logged

Oleg Ignatiuk
https://www.xml-sitemaps.com
Send me a Private Message

SEM and SEO Reports, more than 45M domains: The world's leading Competitive Intelligence Tool for digital marketing.