XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
July 05, 2008, 01:38:30 PM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
6620 Posts in 1636 Topics by Members
Latest Member: web16
Pages: [1]
  Print  
Author Topic: system keeps stopping  (Read 12502 times)
CustomerService
Registered Customer
Newbie
*
Posts: 5


View Profile
« on: December 07, 2005, 02:41:57 AM »

I'm finding that Standalone XML Sitemap Generator keeps stopping. If I start a job with "only" a few hundred pages it goes fine. When I start the job with 20,000 pages it stops itself after anywhere from 2 minutes to 10 minutes. I don't see any error or any problem; it's just stopped. When I go back to the page, it asked me if I want to resume.

Almost all of the 20,000 pages are the same page name except different database variables complete the page based upon the variables in the URL. Is it possible that the system is trying to access the same page too frequently and shutting itself down?? Undecided
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #1 on: December 07, 2005, 02:55:20 AM »

Hi,

the crawler tries to access EVERY page it find link to (unless youspecified it in the exclusion options).
So, if you have a larger site than can be crawled with your current php settings, try to increase max_execution_time and memory_limit settings in your php.ini (if you have access to it) and restart apache.
Logged

CustomerService
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #2 on: December 07, 2005, 01:56:33 PM »

Included php.ini are the following two lines:
max_execution_time = 30     ; Maximum execution time of each script, in seconds
memory_limit = 8M      ; Maximum amount of memory a script may consume (8MB)

What would you suggest??

Is there any way to overcome this limit without changing the "max_execution_time"?? Perhaps by placing some refresh line on Standalone XML Sitemap Generator crawler page??

Before I purchased Standalone XML Sitemap Generator, I ran a 500 page site map against my site via your online system and it worked great. Could there be some other setting in Standalone XML Sitemap Generator that's preventing this from running??

My website is [external links are visible to admins only]. Can you try running a larger sitemap on it and see what happens?? FYI-I have about 24,000 pages.
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #3 on: December 07, 2005, 04:03:45 PM »

Hello,

obviously it takes more time to crawl a big site. Can't say the exact values for you, but you may try to find the better values by yourself: increase them somehow and see if it is enough (then increase gain if it still fails).

Some php configurations allows to increase max_execution_time within script and Generator script always tries to do this, but probably this doesn't work in your case.

You can also run generator in shell if you have SSH access to the host (runcrawl.php file should be executed) and setup cron job to do this on scheduled base.
Logged

CustomerService
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #4 on: December 07, 2005, 05:36:04 PM »

I do have SSH access. We use a piece of software called "Putty" to access the site. By the way, I only need to do this once (vs. setting it up on schedule).

Unfortunately, I'm not very technical. Specifically what command would I type into Putty to execute "runcrawl.php"? Is there any way to stop the process once it starts if for some reason it causes a problem?
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #5 on: December 08, 2005, 02:28:17 AM »

At the "Crawling" page of sitemap generator you will find a simple instruction on what to use to setup cron job/how to execute the script in ssh.

For instance:
http://www.xml-sitemaps.com/generator-demo/index.php?op=crawl

Quote
Cron job setup
You can use the following command line to setup the cron job for sitemap generator:
/usr/bin/php /home/xmlsites/public_html/generator-demo/runcrawl.php

And you can stop the crawler process any time using shell in the same way as any other process (using "kill" shell command).
Logged

CustomerService
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #6 on: December 08, 2005, 05:50:34 AM »

You're being very patient with me. Thank you.

I'm now running the program though SSH. When I run it for 500 pages as a test, it runs perfectly. I then ran it for 25,000 pages maximum (as we have about 22,000). I ran it twice with the same results. Both times it ran for 19 minutes and then stopped. Nothing was output to the sitemap.

On the following page, I've placed (a) the permissions for the various files so you can see if that's right and (b) the output from running the program for 19 minutes:
[external links are visible to admins only]

I then set it to save every 300 seconds. When I do that, the job just stops sooner and then restarts from the beginning.

Thanks
Logged
admin
Administrator
Hero Member
*****
Posts: 2755


View Profile
« Reply #7 on: December 08, 2005, 11:05:53 PM »

Hi,

first of all, you should disable "save run state" option if you execute it from command line (it just take additional load which is not necessary).

It's strange that it is interrupted with no error message displayed - what is displayed after the last line in your output dump (4980 | 16636 | 101,817.2 | 18:57 | 63:19 | 4 | - | 4976 | 99 | 0). The command line prompt is shown?

Well.. generally, I can suggest to update PHP version from 4.1.2 (your current one) to the more recent release. It is possible that there is some bug that appears when processing large data arrays or similar.
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!