Crawler skips a subdirectory
« on: September 02, 2013, 06:54:24 AM »
I am using 2012-05-23 v6.0 of the Standalone generator. After doing a major overhaul of my site, I ran the generator. It crawled all the pages (except the ones I told it not to in the configuration) but there was one bad link. I fixed the link and recrawled the site. It did not crawl all the pages that time. It skips a subdirectory with 233 files in it but does 3 other subdirectories just fine. I had it set to do 3 levels so I changed it to 4 levels and then to 0 for any number of levels and it still skips the one subdirectory.

I had downloaded a copy of the first one it ran with all the pages and double-checked it to make sure it had done it right the first time. I took a screen capture of the progress when it stopped on level 2 and it showed there were 233 pages left and 1 queued for the next level. Then it went to the 3rd level and did nothing and ended (attached).

I am also attaching a screen shot of the change log. The second one in the log shows the run that included all the pages I wanted. Numbers 3-6 show it skipped 295 pages of which I wanted 233 included.

The third attachment shows the Site Structure analysis. It shows 4 subdirectories that were included but is missing the prgms_past/ (233 files) subdirectory. It also does not show the photos/ folder but those files did get included in the sitemap_images.xml file.

During one of the test runs, I tried adding the prgms_past/ folder to be included but not parsed in the configuration but it was still skipped.

There are files that were crawled that have links that point to the files in the subdirectory that is skipped so I don't understand why they are not included in the crawl after the first time they were included. Any ideas?
« Last Edit: September 02, 2013, 06:56:17 AM by lockley »
Re: Crawler skips a subdirectory
« Reply #1 on: September 04, 2013, 09:09:37 PM »

could you please PM me your generator URL and an example URL that is not included in sitemap and how it can be reached starting from homepage?