Skipped URLs
« on: January 11, 2013, 04:40:44 PM »
I just ran my generator for the first time and it is skipping a large number of pages.  On the changelog it says "400 Bad Request" next to all of them.  Anyone have any suggestions?
Re: Skipped URLs
« Reply #1 on: January 13, 2013, 01:42:47 PM »
Hello,

perhaps your site blocks requests when there are too many of them coming in short time period. You can try to define 1 second delay after each 1 request in generator configuration to slow down the crawler.
Re: Skipped URLs
« Reply #2 on: January 14, 2013, 05:31:14 PM »
I tried doing that just now and it still isn't working.  As it was running it showed it had scanned more pages than it had added to the sitemap for example:

Links depth: 2
Current page: products/17_AAG-1128-49.html
Pages added to sitemap: 391
Pages scanned: 836 (24,501.0 KB)
Pages left: 2733 (+ 595 queued for the next depth level)
Time passed: 0:16:08
Time left: 0:52:46
Memory usage: 4,804.5 Kb

In the end it has skipped 1000 pages every time I have run it, crawled 4164, and only indexed 431.
Re: Skipped URLs
« Reply #3 on: January 14, 2013, 07:26:53 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.