problem...
« on: December 14, 2006, 05:58:13 PM »
I have a website with over 500 000 pages...Generator begins to work very slowly after 2000 pages, after 3000 i even can not update browser window..
what is the problem?
Re: problem...
« Reply #1 on: December 14, 2006, 11:07:18 PM »
Hello,

for the larger sites it is suggested to execute sitemap generator from the command line via SSH for better performance.
You can also use "Do not parse" / "Exclude URLs" options to skip certain URLs from processing.
Re: problem...
« Reply #2 on: December 16, 2006, 01:05:42 AM »
Ok thanks.
By the way, if i need to exclude the following url type
[ External links are visible to forum administrators only ]
and
[ External links are visible to forum administrators only ]
Where X and Y random numbers...
What  correct exlusion combination in "configuration" should I use?
Re: problem...
« Reply #3 on: December 17, 2006, 01:12:29 AM »
You can add the following into "Do not parse" and "Exclude URLs" options:
Code: [Select]
print
mark=
Re: problem...
« Reply #4 on: December 17, 2006, 10:52:27 PM »
Oleg, and what does this message mean and what can I do to avoid it...

Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 1048576 bytes) in /home/test/www/sm/pages/class.grab.inc.php(2) : eval()'d code on line 286
Re: problem...
« Reply #5 on: December 17, 2006, 11:48:47 PM »
This message means that your php configuration limits amount of memory for scripts and it is not enough to create full sitemap (in case if a lot of URLs are found). You should increase memory_limit setting in your php config (php.ini) to avoid this.
Re: problem...
« Reply #6 on: December 18, 2006, 10:38:47 PM »

Hi,
Even after i wrote a code to exclude urls, genarator says :
Pages scanned: 12100 (301,823.7 Kb)
Pages left: 88474

But the amount of pages for indexing is maximum 15 000

So the first question is does generator fallow changes in configuration that had benn done after the crawling was started  ?
Does generator look for robots.txt and make any exclusions itself?
Re: problem...
« Reply #7 on: December 19, 2006, 10:57:50 PM »
Hello,

Quote
So the first question is does generator fallow changes in configuration that had benn done after the crawling was started  ?
In case if you resume generation with changed options, they will be applied correspondingly. This doesn't happen on-the-fly though (if generator is currently running).
Quote
Does generator look for robots.txt and make any exclusions itself?
Yes, robots.txt exclusion is applied and there are options to apply additional exclusions:
"Do not parse extensions"
"Do not parse URLs"
"Exclude from sitemap extensions"