Sitemap vs. Google site: search
« on: December 07, 2006, 11:41:59 PM »
I am trying to build a sitemap for a site with about 700+html pages, 6K+ forum page (vBulletin) and 500+ e-store pages (x-cart). When I try to build the map, xml-site generator finds 50K plus pages to crawl and it never finished the map. Last time I looked, it has found 20K pages, and thought it had 60K to go. The more pages it crawls, the more it finds to crawl. Is google having the same problem with us?

Checking Google site: search, it finds 20K plus pages, which include a generated bookstore that I stopped using and deleted well over a year ago.

Still, while those pages are cached somewhere, they are not on my website.

What do you think I am doing wrong?

1. Why all those pages to crawl?
2. Why does the list grow and the sitemap never finish?
3. Why does Google find 20K+ pages that were delete a long time ago?
4. When you click on one of the site: search pages, it is a bad link. How can I fix that?
5. Are these old pages and bad links damaging my SE ranking?

Thanks for any help on this.
FB
Re: Sitemap vs. Google site: search
« Reply #1 on: December 08, 2006, 04:06:50 PM »
Hello,

although you may have a fixed number of posts in forum and products in shopping script, the number of URLs is usually MUCH greater since the scripts generate many redundant pages, like forumdisplay pages with different sorting order, "goto specific post on the page", member lists, "printer friendly pages" and many others. You can use "Do not parse URLs" and "Excluded URLs" options to avoid crawling of those pages. For instance, here is the suggested list for vBulletin:
https://www.xml-sitemaps.com/forum/index.php/topic,241.html

Quote
4. When you click on one of the site: search pages, it is a bad link. How can I fix that?
5. Are these old pages and bad links damaging my SE ranking?
In case if removed pages return 404 code, google will clean them from the index eventually.
Re: Sitemap vs. Google site: search
« Reply #2 on: December 09, 2006, 06:53:50 PM »
Oleg,

Thanks for this. I think I am half way there. I made those changes, but the crawl is still growing faster than pages are crawled. My guess it is my x-cart shopping cart -- and the same issues as vbulletin. Can you give me tips for what ULRs to exclude for x-cart?
FB
Re: Sitemap vs. Google site: search
« Reply #4 on: December 10, 2006, 10:19:03 AM »
Thanks again Oleg,

I added those, and it still would not finish the crawl, so I took your advice, and limited the crawl. First to 5,000 pages, and it went right through. Thanks. It looks like Xcart is one of the sources of the size. Here is a sample URL:

/store/product.php?productid=16143&cat=24&page=1

I don't have 16,000 products, so I am stuck on this. Also, would it help if I bought the SEO package that converted all the .php pages to html pages?

Thanks for all the support. I appreciate your product. Now we can see if my sitemap has some impact with Google.
FB
Re: Sitemap vs. Google site: search
« Reply #6 on: December 11, 2006, 09:17:04 AM »
I limited the crawl to 5,000 pages have successfully created the first site map.

Now, I need to figure out where the large number of pages comes from. I am guessing it is in Xcart. When I raised the crawl to 9,000 pages, the Generator stopped at 6,000 pages, and claimed there were 35,000 more to go. Though I had limited the crawl to 9,000 pages.

Any ideas on why this is happening?

Also, any other tips on what to exclude from X-cart to make the sitemap accurate?
FB
Re: Sitemap vs. Google site: search
« Reply #7 on: December 11, 2006, 06:18:52 PM »
Hello,

Quote
the Generator stopped at 6,000 pages, and claimed there were 35,000 more to go.
it will stop when 9,000 URLs are crawled.
Quote
Also, any other tips on what to exclude from X-cart to make the sitemap accurate?
You can exclude whole shop folder by adding "store/" in "do not pare" and "exclude" options.