I think that Apache config also affects the performance of this script.
I used to have the Apache [external links are visible to admins only] value set to 30 seconds and that is far too low for running this script with big sites. It works fine with few web pages. I have increased the timeout value to 300 and I have been able to crawl over 1,000 pages. I need to crawl some 50 K pages, so I will play around with the timeout value until I achieve this.
Nevertheless, I think setting up a high timeout value is not very good for Apache performance. Moreover, many hosting providers are not willing to change this settings, unless you have a dedicated server. Maybe the XML Sitemaps script could avoid this time-out issues by forcing some kind of reload. I know it is possible to force browser reloads... Maybe there is also some solution when you use the crontab options.