XML Sitemaps Generator

Author Topic: Very large site, taking long to crawl  (Read 9412 times)

michael37

  • Registered Customer
  • Approved member
  • *
  • Posts: 1
Very large site, taking long to crawl
« on: April 15, 2010, 03:33:01 PM »
Hi,

I have purchased the standalone crawler and am running it on a very big site (few million pages). It has been going for a day now and is only on 98000 pages. I am concerned this is going to take a few weeks.

It is crawling on a dedicated box (which runs only a few sites, including the one it is crawling), 8 CPUs and 8GB RAM, I have not made it store to disk so it should be running at its full potential.

Any tips on speeding it up?
Once it finishes and I need to update the site map (via cron) - will it have to recrawl and take this long again?

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10625
Re: Very large site, taking long to crawl
« Reply #1 on: April 16, 2010, 10:44:23 AM »
Hello,

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 200-300,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

 

SMF 2.0.12 | SMF © 2014, Simple Machines
XHTML RSS WAP2