Sitemap using Cron
« on: February 26, 2010, 09:57:05 AM »
Hey all ,
I have almost around 1,00,000 page URL and it takes so much time in generating sitemap.
now i want to create a cron file which get executed every week and update my sitemap,
now the issue is when i run crawling manually and when session is expired ,they provide a checkbox to start the previous session and continue generating sitemap from the last expire session.So that it do not always start generating it sitemap from first URL.

So my question is if i run it th' cron , is this file continue execution upto last entry or if it get expire .how do i continue with the last expired session.
Is it like Execution cron again will start the generation from last expire time or anything else required for it.

any help will be greatly appreciated.
Re: Sitemap using Cron
« Reply #1 on: February 27, 2010, 02:02:25 PM »

the cron task automatically resumes previously interrupted session.

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 200-300,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.