XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
December 05, 2008, 11:21:07 AM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
8403 Posts in 2075 Topics by Members
Latest Member: ovidiu.suciu
Pages: [1]
  Print  
Author Topic: Sitemap vs. Google site: search  (Read 9153 times)
fb-guy
Registered Customer
Newbie
*
Posts: 4


View Profile
« on: December 07, 2006, 11:41:59 PM »

I am trying to build a sitemap for a site with about 700+html pages, 6K+ forum page (vBulletin) and 500+ e-store pages (x-cart). When I try to build the map, xml-site generator finds 50K plus pages to crawl and it never finished the map. Last time I looked, it has found 20K pages, and thought it had 60K to go. The more pages it crawls, the more it finds to crawl. Is google having the same problem with us?

Checking Google site: search, it finds 20K plus pages, which include a generated bookstore that I stopped using and deleted well over a year ago.

Still, while those pages are cached somewhere, they are not on my website.

What do you think I am doing wrong?

1. Why all those pages to crawl?
2. Why does the list grow and the sitemap never finish?
3. Why does Google find 20K+ pages that were delete a long time ago?
4. When you click on one of the site: search pages, it is a bad link. How can I fix that?
5. Are these old pages and bad links damaging my SE ranking?

Thanks for any help on this.
FB
Logged
admin
Administrator
Hero Member
*****
Posts: 3530


View Profile
« Reply #1 on: December 08, 2006, 04:06:50 PM »

Hello,

although you may have a fixed number of posts in forum and products in shopping script, the number of URLs is usually MUCH greater since the scripts generate many redundant pages, like forumdisplay pages with different sorting order, "goto specific post on the page", member lists, "printer friendly pages" and many others. You can use "Do not parse URLs" and "Excluded URLs" options to avoid crawling of those pages. For instance, here is the suggested list for vBulletin:
http://www.xml-sitemaps.com/forum/index.php/topic,241.0.html

Quote
4. When you click on one of the site: search pages, it is a bad link. How can I fix that?
5. Are these old pages and bad links damaging my SE ranking?
In case if removed pages return 404 code, google will clean them from the index eventually.
Logged

fb-guy
Registered Customer
Newbie
*
Posts: 4


View Profile
« Reply #2 on: December 09, 2006, 06:53:50 PM »

Oleg,

Thanks for this. I think I am half way there. I made those changes, but the crawl is still growing faster than pages are crawled. My guess it is my x-cart shopping cart -- and the same issues as vbulletin. Can you give me tips for what ULRs to exclude for x-cart?
FB
Logged
admin
Administrator
Hero Member
*****
Posts: 3530


View Profile
« Reply #3 on: December 09, 2006, 09:43:13 PM »

You can try to add the following for x-cart:

Code:
js=
sort=
printable=
Logged

fb-guy
Registered Customer
Newbie
*
Posts: 4


View Profile
« Reply #4 on: December 10, 2006, 10:19:03 AM »

Thanks again Oleg,

I added those, and it still would not finish the crawl, so I took your advice, and limited the crawl. First to 5,000 pages, and it went right through. Thanks. It looks like Xcart is one of the sources of the size. Here is a sample URL:

/store/product.php?productid=16143&cat=24&page=1

I don't have 16,000 products, so I am stuck on this. Also, would it help if I bought the SEO package that converted all the .php pages to html pages?

Thanks for all the support. I appreciate your product. Now we can see if my sitemap has some impact with Google.
FB
Logged
admin
Administrator
Hero Member
*****
Posts: 3530


View Profile
« Reply #5 on: December 10, 2006, 10:44:24 PM »

You are welcome!
Logged

fb-guy
Registered Customer
Newbie
*
Posts: 4


View Profile
« Reply #6 on: December 11, 2006, 09:17:04 AM »

I limited the crawl to 5,000 pages have successfully created the first site map.

Now, I need to figure out where the large number of pages comes from. I am guessing it is in Xcart. When I raised the crawl to 9,000 pages, the Generator stopped at 6,000 pages, and claimed there were 35,000 more to go. Though I had limited the crawl to 9,000 pages.

Any ideas on why this is happening?

Also, any other tips on what to exclude from X-cart to make the sitemap accurate?
FB
Logged
admin
Administrator
Hero Member
*****
Posts: 3530


View Profile
« Reply #7 on: December 11, 2006, 06:18:52 PM »

Hello,

Quote
the Generator stopped at 6,000 pages, and claimed there were 35,000 more to go.
it will stop when 9,000 URLs are crawled.
Quote
Also, any other tips on what to exclude from X-cart to make the sitemap accurate?
You can exclude whole shop folder by adding "store/" in "do not pare" and "exclude" options.
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.7 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!