XML Sitemaps Generator

Author Topic: long running crawl  (Read 13769 times)

lenk

  • Registered Customer
  • Approved member
  • *
  • Posts: 8
long running crawl
« on: October 08, 2008, 12:21:03 PM »
Hello Oleg,
  I have a question. This is the our configuration:

Do not parse URLs: 184/
Progress state storage type: var_export
Maximum depth level: 7

We are crawling our big site, at some moment we see the following statistic:

Time is 9:18

Links depth: 7
Current page: 184/europe/deutschland/mecklenburg-vorpommern/fischland-darss-zingst/dierhagen/atr182756.html
Pages added to sitemap: 220441
Pages scanned: 220460 (1,030,267.2 Kb)

crawl_dump.log  77732K

Next time when the crawl_dump.log was updated is

09:54

Links depth: 7
Current page: 184/europe/schweiz/wallis/crans-montana/edom-7789.html
Pages added to sitemap: 220481
Pages scanned: 220500 (1,030,267.2 Kb)
Pages left: 11051 (+ 0 queued for the next depth level)
Time passed: 7178:16
Time left: 359:45


during this period runcrawl program (version 2.eight) took 90% of CPU and didn't hit the http server. So, did the program parse  crawl_dump.log file? Is this a normal behaviour?

Thank you.

 
« Last Edit: October 08, 2008, 12:27:48 PM by lenk »

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: long running crawl
« Reply #1 on: October 08, 2008, 10:26:59 PM »
Hello,

since you have "184/" defined in Do not parse option, sitemap generator doesn't request those URLs from your server (which improves performance) and just scans all remaining URLs to include them in sitemap (thus higher CPU usage).
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

lenk

  • Registered Customer
  • Approved member
  • *
  • Posts: 8
Re: long running crawl
« Reply #2 on: October 09, 2008, 07:38:24 AM »
"Scans" means from the crawl_dump.log file?

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: long running crawl
« Reply #3 on: October 10, 2008, 06:21:12 AM »
It scans the URLs list in memory, the crawl_dump is only being extracted one time when started.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

 

SMF 2.0.12 | SMF © 2014, Simple Machines
XHTML RSS WAP2