Buy this script or use Google?
« on: April 27, 2006, 04:43:20 AM »
I'm trying to figure out the best way for us to build our site map... Should I buy this script or would running Google's Python on our server make more sense?

We have a PHP site that builds maybe 500,000 pages dynamically.  Many of these pages have not been hit by users yet, so they might not be in our log files...  However, every page most likely has had a link created by a page that has been visited... So maybe there is a link in a log file???  Although most of these pages get updated very frequently, once indexed, they only need to be respidered every once in awhile...  month or couple months.   

We probably have about a million pages or more on the site in total...  I'm really only concerned about half of them as I don't think that Google would ever serve up the other half from random searches. 

I don't mind paying a server admin to instal the Python script...  And I wouldn't mind buying this product... 

I just want to do what is best for our site.  I'm also not worried about a site map for users... Only to make sure that Google finds all of the pages that we want it to.   

And how long would it take to generate for a site this size? 

Thanks.


« Last Edit: April 27, 2006, 05:36:02 AM by thinkmango »
Re: Buy this script or use Google?
« Reply #1 on: April 29, 2006, 03:37:22 PM »
Hello,

thank you for your question.

Simply put, there are two points to mentions in comparison:

1. Google's python script will add to sitemap only the pages that were already visited (there is no info about the pages linked from the accessed page in the logs), while our Sitemap Generator script crawls the site and so finds all links, creating complete sitemap.

2. Scaning the logs is a faster way to generate sitemap though, since there is no need for network connection to your site for every page, like Sitemap Generator does.


Please check this topic regarding the time required to create sitemap for large site:
How long it will take to generate sitemap

Sitemap Generator Frequently asked questions


Let me know if you have further questions.
Re: Buy this script or use Google?
« Reply #2 on: April 29, 2006, 05:40:02 PM »
Thanks for the feedback.  Do to the nature of our site, we do need it to crawl the site so that it generates good search pages. 
We bought the script and hope to get it all tied in this weekend.

Though crawling the 500,000 pages is going to take some time.

Cheers.
[ External links are visible to forum administrators only ]
« Last Edit: April 29, 2006, 05:42:40 PM by thinkmango »
Re: Buy this script or use Google?
« Reply #3 on: April 30, 2006, 10:19:40 AM »
Ok, great. :)

Please do not forget that you can use "Do not parse URLs" (and "Exclude URLs") to reduce the sitemap generation time significantly (depending on your site's structure).
Re: Buy this script or use Google?
« Reply #4 on: May 02, 2006, 09:02:12 PM »
Ok, so I have it running and it just passed 10,000 pages...  Not sure how you keep the script running when the browser window is closed...  Pretty cool. 

I was expecting it to write the first site map after 10,000 pages.  Does it go through the entire site before I see anything on the view sitemap page?

I think that this is take a few days and I hope to not have to do it twice.

« Last Edit: May 02, 2006, 09:05:11 PM by thinkmango »
Re: Buy this script or use Google?
« Reply #5 on: May 02, 2006, 10:27:16 PM »
Hello,

yes, the sitemap generator collects all info first and the creates all sitemaps and sitemap index file.
Quote
I think that this is take a few days and I hope to not have to do it twice.
Make sure that you have enabled "Save state" option so that you don't have to run it from the scratch in case if it is stopped (set it to every 300 seconds, fo instance).
Re: Buy this script or use Google?
« Reply #6 on: June 15, 2006, 07:41:20 PM »
I just ran this script for near a week... 
Request date:
15 June 2006, 12:08
Processing time:
495497.29s
Pages indexed:
209491
Sitemap files:
6
Pages size:
9,556.91Mb


It finished but didn't write the files:

Sitemap file is not writable: /home/hookup.com/httpdocs/sitemap21.xml.gz
Sitemap file is not writable: /home/hookup.com/httpdocs/sitemap22.xml.gz
Sitemap file is not writable: /home/hookup.com/httpdocs/sitemap23.xml.gz
Sitemap file is not writable: /home/hookup.com/httpdocs/sitemap24.xml.gz
Sitemap file is not writable: /home/hookup.com/httpdocs/sitemap25.xml.gz

Is there anything that can be recovered?  Or do I need to start again? 
Not sure why it didn't write.  We have these permissions completely open.

Jason
Re: Buy this script or use Google?
« Reply #7 on: June 15, 2006, 11:49:02 PM »
Hello Jason,

the files were not written because your /home/hookup.com/httpdocs/ folder is not writable for scripts. You should either:
1. set 0777 permissions for /home/hookup.com/httpdocs/ folder (which is not recommended)
2. or create all sitemap files (empty) manually first and set 0666 permissions for them. Sitemap Generator will overwrite them after that.

To avoid another long sitemap crawling process, you can simply fine the duplicate files in your generator/data/ folder. You can simply copy these files to your httpdocs/ folder and do not forget to set 0666 permissions to them for further generations of sitemap.