Damned If I Can Get It To Crawl
« on: December 14, 2008, 07:40:43 AM »
I have a site with around 20 directories - each hold many many sub-directories and between them all - around 75,000 actual php pages.

I installed it just fine and set it up to run but it won't crawl. My sitemap winds up being just my index.php page (which is actually just a blank page used for the script to create a user friendly page).

This is what I wind up with

Request date: 13 December 2008, 21:46
Processing time: 2.28s
Pages indexed: 1
Sitemap files: 1
Pages size: 0.00Mb

I don't receive any error messages

Re: Damned If I Can Get It To Crawl
« Reply #1 on: December 14, 2008, 03:19:55 PM »
Hello,

are your pages crawlable, i.e. you can reach any page by clicking links, starting from your homepage?
You can use our SE bot simulator to check that: https://www.xml-sitemaps.com/se-bot-simulator.html
Re: Damned If I Can Get It To Crawl
« Reply #2 on: December 14, 2008, 09:41:46 PM »
Hmm.... Well, that brings up some issues for me.

If I have to tell it where to go, it's not really crawling my website and looking for the pages on the server. This is more of a link indexer. I've found plenty of free (and actual) sitemap crawlers where I give it a starting directory and it finds all the pages on the server in any and all subdirectories,  but they don't create the xml files that I need - which is the sole reason I purchased the script.

Going past that issue, I used one of those sitemappers and saved the file as a php page to use as a starting address and uploaded that file to the server. Then, I used your SE bot simulator and gave it the file and yes, it found the page and files, but only a fraction of them. I'm assuming that you have a limit set on the free tool - but at this point, I'm unsure if you do or if it just doesn't find them all.

Going past that issue, I then used my purchased copy to try and "crawl" my site using the file I created (as mentioned above). The script just hangs up at this point:

Please wait. Sitemap generation in progress...

It never actually creates any pages or sitemaps.

Given all of this, I'm a bit disappointed in what could have been a very useful tool. The script doesn't crawl a website (language used on your site), it spiders it - meaning you have to tell it where to go or it gets lost. That doesn't work for me. I'll be adding another 300 directories/sub-directories and another 300,000 pages on the server and they are not linked together so spidering software isn't going to be a solution for me. I spend my time creating the pages, I don't have time to give a spider instructions on where to go.

Another question I'd pose is this...

Given the fact that your script requires directions on where to go and what to look for and given the fact that most webmasters don't have time to write out a "map" so your spider doesn't get lost - why would you not include a secondary script (freely available on the net) and incorporate that script into yours so a map can be generated first and then used as instructions on where to go for your spider for sites such as mine that has hundreds of thousands of pages that are not linked together?
Re: Damned If I Can Get It To Crawl
« Reply #3 on: December 15, 2008, 02:51:54 PM »
Hello,

sitemap generator script crawls the site similar to normal visitors and search engine bots, finding all pages that can be reached by "clicking" links. Most of the sites nowadays are dynamic, created with server-side scripts (shopping sites/catalogs/communities etc), i.e. they do not have a list of static files on the server stored in folders and all pages are created on the fly.

Quote
Please wait. Sitemap generation in progress...
It never actually creates any pages or sitemaps.
Please PM me your generator URL/login to check that.
Re: Damned If I Can Get It To Crawl
« Reply #4 on: January 03, 2009, 03:04:17 AM »
I've pm'd you the url and the user/pass

Please look into it.