Slow crawling
« on: July 21, 2016, 11:33:33 AM »

I'm using unlimited (PHP) sitemap generator. My site is pretty large (more then 50K pages/link). Questions:

1. It tooks 3 hours to generate 1.4MB (still not finish yet) crawl_dump.log file. Is it normal? Seems like it takes forever.

2. How frequent should I cron this crawling process? Every day?

3. I suspect the crawling process slow down my site. Any idea how to minimize this?

4. For this option: "Make a delay between requests, X seconds after each N requests", any idea what number should I put for the X and N ?

Thank you.

Re: Slow crawling
« Reply #1 on: July 22, 2016, 06:21:35 AM »

1. The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

2. it depends on the site of website and how frequently the pages added on it, it might be daily updated, as well as weekly, for instance.

3,4. You can try to use "Make delay" setting to slow down crawling and avoid overloading server. Exact setting depend on how much slower it needs to be (for instance, 1 second after each 10th request, or 1 second after each 1 request).
Re: Slow crawling
« Reply #2 on: May 19, 2017, 02:56:54 AM »
I am getting 130000 pages in 23 minus. Used to having 130000 pages in days.
Hint was in Narrow Indexed Pages Set. Try Exclusion preset. I am now using this software without a hassle.