Super Slow and Some Questions
« on: March 22, 2006, 05:41:59 PM »

I've read the posts about trying to improve performance - I run a website that has a million or so pages.  Has anyone been able to successfully create a sitemap of this size running this software?

I've been running this thing for a couple days now, I've changed the configuration and added some info into the Do Not Parse section.  I have two entries, is the syntax below correct?  The first line - I'd like to not parse any of my forum posts - they all have a different topic id.  The second line is a directory.  Are those correct?

phpBB2/viewtopic.php?t=
phpBB2/forum/

Next question, adding these into the Do Not Parse did not improve my performance.  I'm running in the background but had to resume the process after it stopped for some reason.  Is it using my new configuration?  Do I need to start over?  How do I get the crawl to use the updated configuration?

Even though the program is running in one browser (with the background option), when I go to the Crawling section in another browser, the second browser asks if I'd like to continue the interupted session.  However, I see the numbers still updating in the first browser, so it appears to be running.  Why is this happening?

How do I stop the crawl if it is running in the background?

Thank you for your help - I haven't given up on this quite yet!
Re: Super Slow and Some Questions
« Reply #1 on: March 22, 2006, 06:35:18 PM »
Hello,

sitemap generation time depends mainly on the page generation time at your site (the performance of your site).
Further details (including timing examples) at https://www.xml-sitemaps.com/forum/index.php/topic,95.html
Quote
Are those correct?

phpBB2/viewtopic.php?t=
phpBB2/forum/
Yes, you can also include the following to exclude the whole folder:
phpBB2/
Quote
Next question, adding these into the Do Not Parse did not improve my performance.  I'm running in the background but had to resume the process after it stopped for some reason.  Is it using my new configuration?  Do I need to start over?  How do I get the crawl to use the updated configuration?
The improvement means that generation will be completed faster, not that it will take less time per page.
Quote
Even though the program is running in one browser (with the background option), when I go to the Crawling section in another browser, the second browser asks if I'd like to continue the interupted session.  However, I see the numbers still updating in the first browser, so it appears to be running.  Why is this happening?
For the larger sites it is recommended to execute sitemap generator in command line (via SSH) - in this case execution will should be interrupted at all.
Quote
How do I stop the crawl if it is running in the background?
You can manualy create "interrupt.log" file in the data/ folder to interrupt it.
Re: Super Slow and Some Questions
« Reply #2 on: March 22, 2006, 06:48:28 PM »
Thank you for the response.

Quote
The improvement means that generation will be completed faster, not that it will take less time per page.

I don't think the generation is any faster - do I need to do anything to have the crawl use my updated configuration?  Can I change the configuration on the fly and have those changes be immediately incorporated in the crawl?

Quote
For the larger sites it is recommended to execute sitemap generator in command line (via SSH) - in this case execution will should be interrupted at all.

Ok, so I need to run this via SSH.  Do I interupt the current run (interupt.log), and then launch via SSH, or just launch it now?  Would I still be able to monitor it through the web page if I runt it via SSH?

Thank you.
Re: Super Slow and Some Questions
« Reply #3 on: March 22, 2006, 07:49:30 PM »
Hi,

Quote
I don't think the generation is any faster - do I need to do anything to have the crawl use my updated configuration?  Can I change the configuration on the fly and have those changes be immediately incorporated in the crawl?
I will reword my sentence :) : it will NOT run faster, it will be completed faster, because part of your site is not crawled.

Quote
Ok, so I need to run this via SSH.  Do I interupt the current run (interupt.log), and then launch via SSH, or just launch it now?  Would I still be able to monitor it through the web page if I runt it via SSH?
Yes, you can interrupt it and then execute it again.
You should be able to see the process state in the browser.
Re: Super Slow and Some Questions
« Reply #4 on: March 23, 2006, 06:41:28 PM »
I followed your directions and put up the interrupt.log.

and then ran the program via ssh.

it started over!!!  I had 60,000 pages in and 800,000 more to go, but it reset back to 0!!!  Is there any way to have it start from the 60,000 page mark?  We still have a 162mb crawl_dump_log.

Can this thing even finish a website as large as ours?
Re: Super Slow and Some Questions
« Reply #5 on: March 24, 2006, 10:50:47 PM »
Hello,
Quote
it started over
Answered in Private Message.
Quote
Can this thing even finish a website as large as ours?
Sure, as far as your server configuration allows it.