Running sitemap generator for the first time
« on: September 07, 2010, 12:02:41 PM »
...from a web browser, i have over 200k pages and its been running for nearly 3 hours and has added 3300 pages to the sitemap.

1. Do I have to keep the browser open for it to continue (i'm not running it as a cron job yet).

2. its going to take 200 hours to complete at this rate - will it always take this long or just the first time?

thanks in advance.

JC
Re: Running sitemap generator for the first time
« Reply #1 on: September 07, 2010, 03:27:00 PM »
Hello,

1. if you selected "run in background" checkbox when starting generator, it might keep working.

2. Sitemap is fully regenerated every time.
The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Some of the real-world examples of big db-driven websites:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

With "Max urls" options defined it would be much faster than that.
Re: Running sitemap generator for the first time
« Reply #2 on: September 07, 2010, 06:46:05 PM »
Hi Oleg,

Thanks for your advice.  I've restarted with run in the background.

Where is the max urls option - i'm sure i'm being blind but i couldnt see it?

Is there a way I can speed this up by editing my server settings?  Does it make any differnece setting it up as a cron job?

Also i couldnt access my website while the sitemap was generating, is this a browser or server issue and do you think others ccould still access the site?

thanks
Re: Running sitemap generator for the first time
« Reply #4 on: September 13, 2010, 02:37:34 AM »
I have a related question.  I have a site with only 40K pages but expect it to be 200K+ soon.  The pages are all static once published.  Is there a setting to tell the Generator to only scan new pages?

Are there any tricks on the settings to improve sitemap-generation speed? 

thanks
Re: Running sitemap generator for the first time
« Reply #5 on: September 13, 2010, 05:43:14 AM »
Hello,

sitemap is created fully every time (otherwise generator won't be able to tell which pages were added).
You can use "Do not parse" setting to include URLs in sitemap without fetching them from server - for instance, on the forum it usually sufficient to fetch only forum list pages (since they contain links to topics), while topic display pages can be added in sitemap directly, and can be filtered using "Do not parse URLs" option.

Also, "Exclude URLs" setting can be used to remove unnecessary pages from crawling.