Only recognizing 5k of 200k dynamically generated links
« on: May 22, 2009, 04:04:26 PM »
I looked through the forum but the replies are a bit too cryptic for me to understand a solution to fix my missing links problem. We have about 200k urls in a mysql table but the crawler is not catching all of them. The way the site is set up:

Index page with 100 or so categories >
Under categories various subcategories with a list of titles showing >
Click on a title to bring up the individual record

Starting at the Index page level, the crawler catches about 5k of the 200k individual records.  I dumped the table and randomly selected records the crawler is not finding and they resolve correctly to the page.

I changed the various fine tuning configuration settings on the crawler including depth and timeouts etc but still it will not catch all the records.  Now I'm wondering if it is my site and consequently search engines may not be crawling all the records either?  Yikes.  Or is there something that may be causing the crawler just to pick those 5k because of an attribute set on configuration?

For so much information on a site,  my Google rankings are terrible.  I'm getting no traffic hardly - not even enough to finish out a top 20 traffic report. Did I mention yikes?

Don't know how much a sitemap will help the traffic but hoping it will.  Thanks in advance for possible solutions as to why it's not getting all the URLs.  Also,  I am not sure how to hide the URLs in question in this forum?
Re: Only recognizing 5k of 200k dynamically generated links
« Reply #1 on: May 22, 2009, 05:20:17 PM »
Hello,

could you please PM me your generator URL and an example URL that is not included in sitemap and how it can be reached starting  from homepage?
Re: Only recognizing 5k of 200k dynamically generated links
« Reply #2 on: May 29, 2009, 09:50:19 PM »
I looked through the forum but the replies are a bit too cryptic for me to understand a solution to fix my missing links problem. We have about 200k urls in a mysql table but the crawler is not catching all of them.

I have four sites with more than 200,000 database created urls. I don't even try to crawl the sites for those links. It's unnecessary. It's so much easier to generate a sitemap directly from your database. It'll take you a fraction of the time and use up far fewer resources. Besides that, most of your database generated urls will not need to be updated very often, if at all.

Use the sitemap generator to go after all the other stuff and then stitch the two sitemaps into one. It's really not that hard and you'll be happier with the results.

Thats my $.02