Hi,

I started using the unlimited sitemap generator on my server in a docker container by executing
Code: [Select]
php runcrawl.php &
Since I cannot find any documentation how to use the command line I assume this is the proper way to execute the script from the command line? By the way all configs were made in the web frontend.

Now this is my current output:
Code: [Select]
Links depth: 10
Current page: product/rzWw0/part
Pages added to sitemap: 1172
Pages scanned: 1192 (140,608.6 KB)
Pages left: 26 (+ 1 queued for the next depth level)
Time passed: 0:10:24
Time left: 0:00:13
Memory usage: 6,781.6 Kb

This is extremely slow. I was expecting a much higher execution speed. We have 2.900.000 sites to index (there are catalogues included etc.). I will need 21 days for that. Or is there something wrong on my side and the script should execute way faster? Is there any way I can speed up the execution by a factor of let us say 10?

The last execution I made was from the browser. The execution finished at 450.000 pages after 2 days. Which was approx 2 times faster but it did not finish all the 2.900M sites... although it finished successfully.

All in all is there any documentation for the command line execution that I have missed. Any tips, caveats and tricks? Shall i deactivate the cURL option in my config ?
« Last Edit: December 17, 2019, 01:01:38 PM by info2500 »
Re: Speed optimization and using the command line to execute the crawler
« Reply #1 on: December 18, 2019, 05:54:01 AM »
Hello,

that is correct way to run it in command line.

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.