Huge sitemap files
« on: July 08, 2010, 01:48:22 AM »
I have just upgraded to V4 of XML-sitemaps after discovering that canonical links and images in the sitemap are supported (thank you for adding these great new features)

I have just run a cron job and decided to crawl less pages than normal, the previous sitemap was running up to 250,000 pages, with this I decided 60,000 would be more manageable especially as it takes in to account the canonical links. I am listing 30,000 pages per sitemap.

After the crawling had stopped I decided to head over to webmaster tools to see the results, to my shock Google was indicating that the sitemap files were too large (this was a previous issue on v3 on 50,000 pages per sitemap, hence reducing to 30,000 per sitemap file), I assumed that 30,000 would be acceptable but decided to check out the file sizes of the two sitemap files generated, each was over 100mb!

Could this be because I have chosen to include image information in the sitemap? It is strange that each file is 10x the size of the old sitemap. One thing that has always bothered me with xml-sitemaps is not being able to generate a sitemap not by number of links but by size of the file output as Google seems to only accept files under 10mb or 50,000 links.

What can we do to get around this issue?
Re: Huge sitemap files
« Reply #1 on: July 08, 2010, 09:31:24 AM »

yes, that is related to including images information in sitemap. I would point 2 things related to it:

1. you should exclude most images with no value using "Exclude URLs" setting ( I mean images like navigation buttons, backgrounds etc, which have no intereset for search)

2.  further decrease the number of URLs per file to get sitemap size under 10Mb