d1

*
  • *
  • 1
I have looked through the forum and have not found this question anywhere.

I ran the sitemap generator tool on my very large site (>150K urls) and it completed, but since I didn't know how many sitemap.xml files I would need, nor what they would be named, it only created the sitemap.xml file.  The sitemap.xml file points to 5 other sitemap files, sitemap1.xml, sitemap2. xml, ... sitemap5.xml, and none of those files exist since I didn't know that I would have to create empty files with those names.  I now have created those empty files with permissions set to 666.

Here's the sitemap file: [ External links are visible to forum administrators only ]

The urllist.txt file is complete and contains all of the urls in it.

Here's the urllist.txt file: [ External links are visible to forum administrators only ]

It took several days to finally complete the crawl on such a large site.  And there is no need to re-run the entire crawl again, since the urllist.txt file already contains all of the urls.

How can I re-generate the sitemap files from the urllist.txt without doing the entire crawl again?

There must be a way to skip the crawl part and simply generate the sitemap files from the urllist.txt file, since all of the time-consuming part (the crawl) has already been completed, and the sitemap generation is the final step in the process and should be relatively quick since the crawl has already been completed.

How can this be done?
« Last Edit: October 20, 2008, 02:10:46 AM by d1 »
Hello,

can you provide me with temporary FTP access in private message so that I can create small script to create sitemap from existing urllist.txt.