Nite

*
Desire to run on local machine to index internet site;
« on: December 03, 2006, 05:58:17 PM »
My internet site is on shared hosting.

When I attempt to run the runcrawl.php file, the hosting provider will eventually stop the script from running. I assume it's because the script consumes "a lot" of resources, cpu time, or whatever. Even if it doesn't, I'm sure that my overselling host does not want the server load to increase for any reason and will stop and process that does so.

Therefore, I had an idea. I could install XAMPP on my Windows machine, and run the script locally on my windows machines, and tell the script to index my internet site. My computer would essentially do all the processing.

To this end, would you mind detailing what must be edited in your script so that the final .XML sitemap, along with the periodic saves, and the .HTML sitemaps, can be saved on my local machine? Then, I can upload the final product to my host.

Thanks!

Nite

*
Re: Desire to run on local machine to index internet site;
« Reply #1 on: December 03, 2006, 06:43:25 PM »
I have successfully installed XAMPP. This allowed me to execute .PHP script from my local machine.

Moreover, I can pull pages from my internet site.

However, there is a memory leak, it seems. Since I am running WindowsXP, my pagefile size begins at 360MB, and it gradually increases to well over 1.8+GB as long as the script runs. Then in a climatic show, the memory suddenly drops back down to 360MB with the following error message:

Quote
Fatal error: Out of memory (allocated 1770782720) (tried to allocate 5236677 bytes) in C:\XAMPP\htdocs\generator\pages\class.grab.inc.php(2) : eval()'d code on line 286

P.S. Simple Machines Forum version 1.1 Final was released recently.
« Last Edit: December 04, 2006, 12:59:46 AM by Nite »

Nite

*
Re: Desire to run on local machine to index internet site;
« Reply #2 on: December 04, 2006, 06:34:39 PM »
Has anyone else attempted to run this XML Sitemap generator script on their local machine running Windows XP ? Is there some component, not in XAMPP, that is needed to be installed in order for this to work ? If you got it to work, how did you do it ?

This script runs just fine on my hosting provider's machines. However, my host runs Debian -- and I'm wondering if there's a "difference" between a machine running Debian and a machine running Windows XP with XAMPP.

[Edit]

Since I'm at work, I'll post the links to some helpful threads so that I can find them easier when I get home.

Quote
as discussed in this topic: https://www.xml-sitemaps.com/forum/index.php/topic,124.html,
there are the following options in this case:
- increase max_execution_time setting in php.ini file at your server and restart apache
- execute sitemap generator from command line (if you have ssh access to your server)
- use "Save state" option at Sitemap Generator configuration page and execute the crawler multiple times with "Resume generation" enabled until full sitemap is created
- limit the number of pages to index in Sitemap Generator configuration

And..

Quote
the crawler tries to access EVERY page it find link to (unless youspecified it in the exclusion options).
So, if you have a larger site than can be crawled with your current php settings, try to increase max_execution_time and memory_limit settings in your php.ini (if you have access to it) and restart apache.

https://www.xml-sitemaps.com/forum/index.php/topic,553.html
« Last Edit: December 04, 2006, 06:46:24 PM by Nite »
Re: Desire to run on local machine to index internet site;
« Reply #3 on: December 04, 2006, 06:52:07 PM »
Hello,

you can install script at local machine and all sitemap files will be stored at your computer as wel.
re: memory usage
It is required to keep all crawled URL details in memory until full sitemap is created to avoid duplicate URLs in sitemap (otherwise generator will crawl endlessly in loops). You can save some memory usage by disabling html sitemap and ror sitemap options.
Quote
P.S. Simple Machines Forum version 1.1 Final was released recently.
Thank you for the notice! We'll be updating shortly :)

Nite

*
Re: Desire to run on local machine to index internet site;
« Reply #4 on: December 04, 2006, 11:54:19 PM »
I've increased the values in php.ini and I still get the same error.

I was attempting to "resume" a rather large sitemap that was already stored on my host. Now I'm attempting to "start from scratch" to see what happens.

...

I can see that "crawl_dump" is increasing in size, slowly ... The GUI doesn't update with what's being crawled, though. I suppose it's doing something ...
« Last Edit: December 05, 2006, 12:02:57 AM by Nite »

Nite

*
Re: Desire to run on local machine to index internet site;
« Reply #5 on: December 05, 2006, 04:09:12 AM »
In XAMPP. there is the option to run either php4 or php5. When I ran php5, it borked. When I ran php4, everything worked. Still, the GUI page (that details the link being indexed) does not auto-update; however, I can see that the "crawl_dump" is growing in size. I changed the standard 16MB memory limit in php.ini to 4096MB. And I have set my Window's Page File, found in the control panel to 4096MB. (I have 1GB Ram installed).

The crawl_dump began @ 17MB in size ... remember, I downloaded this to my local machine from my host. I left this script running for 4 hours, and the size has increased to 25MB. In the Windows Task Manager, I can see that the PF Usage has increased from 316MB to 462MB... therefore I think there's a relationship between the Page File size, and the size of the crawl_dump file.

To summarize, run PHP4 if you're going to use XAMPP with Xml Sitemaps.
« Last Edit: December 05, 2006, 05:22:08 AM by Nite »