best settings for running the generator script for crons
« on: July 30, 2010, 01:46:35 AM »
Hi

Just purchased this and I was wondering what the best settings would be to run the generator. I have installed it but have not run it yet even once. I just need to be sure I do it right the first time. There are just too many settings and I want to know peoples experiences of what the best settings for the script should be.

Also, as this script does not have any databases etc, how is it storing the settings for future use? Sorry maybe a nook question but wanted to know.

I have a fairly large site and I just want to be sure that it generates the sitemap properly the first time. I have often tried link slueth for broken url reports and that seems to kill the DB with too many connections even running on a single thread. How can I ensure this script doesnt do the same. Is there a speed setting somehwere?

Also, where are the settings to turn the ping notifications on/off to search engines kept?

As yahoo, bing and ask.com require you to enter your respective API keys to ping them, where do I put this information? How can I set them up to ping only after the sitemap is fully built?

Sorry, i am just too used to the wordpress sitemap generator that does all of this automatically for me.

My website link is [ External links are visible to forum administrators only ]
Re: best settings for running the generator script for crons
« Reply #1 on: July 30, 2010, 11:22:00 AM »
Hello,

sitemap generator comes with a pre-populated options set, that is designed to work fine for most websites, so you can start generating sitemap right away.

Settings are stored in generator/data/generator.conf file.

To make sure the crawler doesn't send too many requests in short time you can use "Make delay for X seconds after every Y request" setting, that will slow down the process of crawling.

Pings are sent only after the sitemap is created, API keys are not required to send pings.

Re: best settings for running the generator script for crons
« Reply #2 on: July 30, 2010, 11:23:25 AM »
Update: I see you have a forum on your site, and makes sense to select "phpBB" URL exclusion preset in sitemap generator settings.
Re: best settings for running the generator script for crons
« Reply #3 on: July 31, 2010, 02:21:40 AM »
Yes I have a forum and also a blog.

My forum folder is accessible via /forum/ and not /phpbb/ unless you are referring to DB tables.

My blog already has a XML sitemap from a plugin, so I will exclude that as well.

Any other pointers I should be aware of?
Re: best settings for running the generator script for crons
« Reply #4 on: July 31, 2010, 02:25:07 AM »
Also, could you please confirm that the last mod date would never change on the sitemap regardless how many times it was generated?

If I generate the sitemap tonight, it should come up with the first ever last mod date. After that, this date should not change when I generate the sitemap again, unless there are newer pages on the site.

Which last mod date would the new pages inherit?

Thanks
Re: best settings for running the generator script for crons
« Reply #5 on: July 31, 2010, 10:45:37 AM »
Quote
My forum folder is accessible via /forum/ and not /phpbb/ unless you are referring to DB tables.
In generator settings there is an option for "Exclusioon preset", select phpBB there.

Quote
Also, could you please confirm that the last mod date would never change on the sitemap regardless how many times it was generated?
It depends on generator settings - you can choose a specific data or "date when sitemap is created" etc.
Re: best settings for running the generator script for crons
« Reply #6 on: August 02, 2010, 12:30:40 AM »
Thank you for your helpful answers. What I did want to ask was if I should put the word "phpbb" for exclusion or the actual name of the folder on the site which is "forum". I beleive it will be the latter as the generator would not have access to the DB tables which have the pre-fix PHPBB.

Also, I found a section for inidividual attributes which is very interesting. Is this where I can assign priority and frequency for each folder on the site? It would help if you could just do a sample for me looking at my site.

define specific frequency and priority attributes here in the following format:
"url substring,lastupdate YYYY-mm-dd,frequency,priority".
example:
page.php?product=,2005-11-14,monthly,0.9

Is it safe to ping all major blog directories on here? Is this were you define to ping google and yahoo as well?
Re: best settings for running the generator script for crons
« Reply #7 on: August 02, 2010, 08:44:04 AM »
There is no need to specify the folder name there.

Yes, you can specify it like (for instance):
forum/,2010-07-01,weekly,0.9

Yes, you can specify the URLs to ping in "Send "weblogUpdate" type of Ping Notification to:" setting.
Re: best settings for running the generator script for crons
« Reply #8 on: August 03, 2010, 01:03:48 AM »
Ok, after all this anticipation, the sitemap only generated after indexing one url. the homepage.. wow

Request date:
3 August 2010, 00:01
Processing time:
0:00:02s
Pages indexed:
1
Sitemap files:
1
Pages size:
0.26Mb
Download:

Whats going on here
Re: best settings for running the generator script for crons
« Reply #9 on: August 03, 2010, 01:06:42 AM »
Does this have anything to do with this?

Site folders structure

/                              - 1



Is this a permissions thing?
Re: best settings for running the generator script for crons
« Reply #11 on: August 10, 2010, 10:36:48 AM »
The software seems to be working fine but what its doing out of the box is creating unecessary clicks to some links that have click counting enabled. These are affiliate links that are all outbound that I wish to ignore. Can  you please help me set up an exception in the software so that it ignores urls that are of a particular level?

In other words, I want the software to completely ignore and not index one full level.

Example:

[ External links are visible to forum administrators only ]  Ignore all of these if there is category1 in the URL.

Accept and index everything else.

Can you please help?


Re: best settings for running the generator script for crons
« Reply #12 on: August 10, 2010, 11:16:47 AM »
Alternatively, if I could add an exception to ignore all links with the "nofollow" attribute attached to it. How can i acheive that?
Re: best settings for running the generator script for crons
« Reply #13 on: August 10, 2010, 12:32:43 PM »
Hello,

you can add this in Exclude URLs setting to avoid indexing all those links:
Code: [Select]
category1/
If you have rel="nofollow" added to the link, it should be skipped automatically too (make sure that it's added in ALL links to that page).
Re: best settings for running the generator script for crons
« Reply #14 on: August 11, 2010, 02:16:23 PM »
This doesnt seem to work as expected. I have clearly excluded certain urls but the progress of the generator shows thats its crawling the exclusions.

Please view attached images