crawling Bulletin board "WBB" failured ...
« on: September 21, 2009, 10:03:25 AM »
Hello @ all

at first: Sorry for my bad english, but i hope that all can understand me ...  ;D

at last week i bought my version from the Unlimited XML Sitemap Generator and use them at first on my online shop website (created with oscommerce) with about 7.500 articles. After a runtime of 1:26 h over the website and a generated sitemap.xml with 40.500 listed urls comes my final result: WOW, great work ... gratulations to the author of this script for his good work and finest xml sitemap script what i've ever found in the internet.

As second attemp, i wanted testing the script an a larger website as my online shop system. An bulletin system named Woltlab Burning Board Rev. 3.01 and after few second the script was finished. I'm wondering about this short time, while the bulletin has over 15.000 Topics, and so i've seen the small sitemap.xml with only one entry with the main page.

By another try wothout any excluded settings the sitemap has included the one entry with the main page url and arround 180 empty entrys. So what is the problem and have anyone an idea for this problem?

Best regards and thank you very much for helpfull answers.
Re: crawling Bulletin board "WBB" failured ...
« Reply #2 on: September 22, 2009, 04:22:40 PM »
PM was send ...  :)
Re: crawling Bulletin board "WBB" failured ...
« Reply #3 on: September 24, 2009, 07:50:03 AM »
My problem was solved.

For everyone, that use also the system of Woltlab too, and have the Plugin "Security System" installed, so they  have to should deactivate them. When the Plugin is active, the bulletin will block and blacklisting completely the crawler.

Resulution of the first attemp by the crawler on my woltlab burning board bulletin by the deaktivated plugin:
in 5:48 h he found 89.623 urls and wrote them in splitted sitemaps.

When you run it over an mysql & php project, look at same time when the crawler is working in your command line from the server and watch to the resources of the sql daemon. When you have an own root server, it's ok, but when you have an hosted webspace, the isp was not affraid over this trouble from the crawler on the sql daemon.

Give it an option, where i can limit the to scanned pages per second or minutes?

Many thanks at the admin of this xml-sitempas forum for its magnificent support and aid in the solution of my problem. 
Re: crawling Bulletin board "WBB" failured ...
« Reply #4 on: September 24, 2009, 06:04:08 PM »
I'm glad we were able to resolve the issue!

> Give it an option, where i can limit the to scanned pages per second or minutes?

There is a special setting for that in sitemap generator configuration:
Make a delay between requests, X seconds after each N requests: