How much bandwidth ?
« on: May 02, 2009, 01:37:01 PM »
Seems like the sitemap generator is consuming more than 150 megs of bandwidth on a single pass. How is that possible for a really small forum with about 700 posts? Maybe I don't understand what its doing but here are the numbers:

Request date:
2 May 2009, 04:26
Processing time:
417.70s
Pages indexed:
2529
Sitemap files:
1
Pages size:
146.08Mb
Download:
XML sitemap
In text format
In ROR format
HTML sitemap

Any insight into this greatly appreciated... TIA
Re: How much bandwidth ?
« Reply #1 on: May 02, 2009, 09:04:12 PM »
Hello,

146Mb per 2,529 pages gives around 57Kb per page, which is quite usual for the average forum page.
Re: How much bandwidth ?
« Reply #2 on: May 02, 2010, 09:52:04 PM »
So, forums by definition have new threads daily and need to be constantly updated.  Even a very small forum would threfore generate 100MB in traffic daily from xml-sitemaps or 3 gigabytes of traffic in a month, vastly exceeding most bandwidth limitations for hosting plans.

How can I set up this application so that the generator doesn't consume so much bandwidth?

I set mine up yesterday with an hourly cron job to update, and smack, I've now exceeded the bandwidth for the month!

Maybe I don't understand what the xml needs or doesn't need on a forum based system.

Thanks

Re: How much bandwidth ?
« Reply #3 on: May 03, 2010, 09:53:38 PM »
Hello,

you never need to have it setup for hourly cron job.  A weekly (daily at most!) cron job is morre than enough.
Re: How much bandwidth ?
« Reply #4 on: May 04, 2010, 05:34:24 AM »
So, if you are suggesting only once a week, I need to understand why, for a forum....

1.  will new threads not be indexed by spiders if they are not in the xml file?

2.  If a spider hits a forum without an xml file, will it index everything?  and then the xml file limits the usefulness of the sitempa?

Re: How much bandwidth ?
« Reply #5 on: May 04, 2010, 09:52:07 PM »
Hello,

search engine bots will still crawl your pages. XML files is an additonal resource used by SEs to crawl the site more efficiently.
Re: How much bandwidth ?
« Reply #6 on: May 05, 2010, 07:53:55 AM »
Do you realize how illogical that sounds?  We are talking about a computer program, not a heuristic organic processing machine.

Either the spider uses the xml to avoid having to crawl the site themselves, or they ignore the xml and crawl the site themselves.

At least thats what my simple mind has wrapped around the idea.  I've paid for two licenses, so I'm not bitching.  I'm just tryting to uderstand how the spiders use a pre supplied xml.

Maybe you have an article or two you can link?  unfortunately, my google fu is not coming up with a good sitemap xml spider interaction primer.

Thanks

Re: How much bandwidth ?
« Reply #7 on: May 05, 2010, 10:16:23 PM »
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184
Quote
Sitemaps are a way to tell Google about pages on your site we might not otherwise discover.
..
Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google's normal crawling process.
..
In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it.
Re: How much bandwidth ?
« Reply #8 on: May 07, 2010, 02:42:17 PM »
Okay, then if I understand it correcty, when a spider goes through the site, it doesn't look at the first page, collect all the links, then hit all those links, collect them all, then hit those, etc. eetc....

It just goes down random link paths?

So an xml sitemap would say to a spider, "hey, at least find These links and index them?

Just asking.

Re: How much bandwidth ?
« Reply #9 on: May 07, 2010, 09:46:36 PM »
Exact details of search engine spiders algorithm is not a public information, but basically - yes, they decide which pages to visit and in what order, not just crawling the whole site.