Stupid Newbie Question
« on: February 17, 2006, 12:10:27 AM »
Is there a way to limit the link depth?

I ran the tool on a test directory and it worked great.  I then ran it with a URL limit and it worked great.  Now I am trying to run it on my whole site and it has been crawling for hours.  It is 6 links down and it says there are 20k more to look at.  I guess what I need to do is figure out how to make it be a little more limited.

I am not sure my site has as many links as the crawl is looking for but it does have a lot of dynamic content so who knows.

Also, if I let this crawl finish and I want to update it on a daily or weekly basis in cron can I expect it to take 6+ hours every time or is it a lot quicker once it crawls the first time.  I am always adding items to my store that I want to be in the sitemap.

My limited understanding is that I should have all my links in the sitemap.  Is this true?

I ran the google sitemap generator the other day and it ran fairly quick and also scaled my log files.  This I thought was great but I wanted something I could submit to yahoo as well so I bought this tool.  It feels like it was worth it and was simple enough to get going but I am having a hard time finding out why and how to set it up for a reasonable sitemap.

Sorry for the rambling.

Thanks in advance
Re: Stupid Newbie Question
« Reply #1 on: February 17, 2006, 01:10:36 AM »

Sitemap Generator  script doesn't have  aseparate option to limit crawling depth additionally to existing maximum number of URLs limitation. If you want to further limit the pages list, you can use "Do not parse URLs" and "Exclude URLs" options for that.

Every time you execute the generator, it starts crawling from the start since it has no information on what parts of your site has been changed and it can be obtained by the full scanning only. You can setup the crawler to run less often though - a weekly sitemap refresh is fine, for instance.
My limited understanding is that I should have all my links in the sitemap.  Is this true?
Yes, the idea of the sitemap is to include all your links to it to make your site easier to crawl for search engine bots.