choking, I think
« on: January 15, 2011, 10:58:41 PM »
Installation went fine and things seem to be working (though I'm still hours into the sitemapping process).  To be fair, every one of about 50,000 pages is dynamically generated via PHP and MySQL, and there are also web services/ API's that may be slowing down the process, too.

When SML-Sitemaps is running there is NO access to the site via a web browser.  I can ping the site and get a great response of under 40 milliseconds.

My php memory is limited to 16 MB by the admin.  Is this the bottleneck?

I tweaked a bunch of settings to be less hoggish on the memory, but it still just crawls.  I'm wondering if I should mirror the website on my local machine, run the software there, then put the sitemap files on the server.  Of course, I'd like to run this all through a cron job and auto-notify Google.

Any suggestions?

Thanks,
Tony
Re: choking, I think
« Reply #1 on: January 16, 2011, 03:39:58 AM »
Also, there is no "stop" button on the index.php page under the crawling tab.  Even though I've had 21,000 pages added to the sitemap dump file, there is no way to stop the process and create a usable sitemap file.

Any suggestions?  (I do get an occasional "out of memory" error from the PHP side of things, if that makes any difference.)

Tony
Re: choking, I think
« Reply #2 on: January 16, 2011, 10:15:27 AM »
Hello,

when you run generator script, it's likely that your browser doesn't allow additional connection to the same domain, so it appears that you can't access the site. Other visitors can access your site with not difference, and you can open your website in another browser as well.

I would increase the memory_limit setting in PHP configuration, since 16M is not usually enough for the larger sites.
You can create an empty file named interrupt.log in generator/data/ folder to stop generator.
Re: choking, I think
« Reply #3 on: January 16, 2011, 01:30:45 PM »
Thank you, Oleg.  Regarding the log file, I can easily create a blank file and put it on the server.  Is there anything else I need to do, other than add a blank file?

At this point I can't even restart the script anymore.  It appears to work on the front end, but the crawl_dump file is no longer increasing in size.  It's currently at 12.3 Megs.
Re: choking, I think
« Reply #4 on: January 16, 2011, 06:14:39 PM »
Quote
Is there anything else I need to do, other than add a blank file?
No, sitemap generator will detect that file exists and will interrupt the crawling.
Re: choking, I think
« Reply #5 on: January 17, 2011, 12:30:08 AM »
I understand the interrupt file now...you were trying to force the script to stop running. 

It had stopped on its own, but the "stop" button you mention in the documentation doesn't seem to appear anywhere.  Now that it has stopped, I still don't have a usable sitemap file, just the dump file.  Is there a way to convert that to a sitemap, since the script never finished?

For now, I'll try deleting the interrupt file and see if I can get the script to finish.

Would like to know how to avoid the memory problems.

Thanks =)
Re: choking, I think
« Reply #6 on: January 17, 2011, 10:15:05 AM »
Hello,

the "stop" link only appears if you started generator with "run in background" checkbox enabled.
You can set the maximum URLs setting to a small number and resume sitemap generator and it won't crawl anymore pages.
Re: choking, I think
« Reply #7 on: January 17, 2011, 01:56:38 PM »
Hi Oleg,

Thanks for the info.  Here's what I'd like to do:
  • Crawl the entire site successfully
  • Publish a working sitemap
  • Auto-notify major search engines

If the script can't do that, I'd like to:
  • publish a sitemap for the 21,000 pages that have been logged
  • have my money refunded

I've spent an entire weekend trying to figure this out.  I've read the documentation, and I've read through the forums.  It appears that whenever there's a problem like this it turns into a private conversation and the solution doesn't get shared, if there is one.

I love the script...if we can make this work I would be very, very grateful!  =)





Re: choking, I think
« Reply #8 on: January 17, 2011, 09:02:45 PM »
Hello,

please let me know your generator URL/login in private message to check this.
Re: choking, I think
« Reply #9 on: January 27, 2011, 05:52:01 AM »
Hi Oleg,

I know you stopped by to check on the XML-Sitemaps generator and successfuly generated a small sitemap.

Are we going to be able to make this thing work?  It keeps choking on my large site.
Re: choking, I think
« Reply #10 on: January 27, 2011, 09:04:44 AM »
Hello,

could you please PM me an example URL that is not included in sitemap and how it can be reached starting from homepage?
Re: choking, I think
« Reply #11 on: January 28, 2011, 03:10:14 AM »
Hi Oleg,

I have over 18,000 unique homes that are displayed on my site, which are pulled from a MySQL database that is updated daily.  For each of those homes, there are several pages they can appear on.  So there are over 50,000 pages.

I'm running a new attempt at the script right now after installing the update.  I will let you know how it goes.

Here is a list of known problems:
-The index.php file does not update these stats, which makes it appear the script has stopped running.  I verified the script is still running by watching the dump file get larger.

Links depth: 4
Current page: details.php?listnum=1006108
Pages added to sitemap: 3235
Pages scanned: 3280 (122,898.0 KB)
Pages left: 4461 (+ 5834 queued for the next depth level)
Time passed: 0:46:40
Time left: 1:03:28
Memory usage: 8,389.1 Kb


-The bottom of the same index.php page shows the following.  If this text means something, I don't know what that is, exactly.


Thu Jan 27 2011 20:04:22 GMT-0700 (MST): resuming generator (120 seconds with no response)
Thu Jan 27 2011 20:02:22 GMT-0700 (MST): resuming generator (120 seconds with no response)
Thu Jan 27 2011 20:00:22 GMT-0700 (MST): resuming generator (120 seconds with no response)
Thu Jan 27 2011 19:58:22 GMT-0700 (MST): resuming generator (121 seconds with no response)
Thu Jan 27 2011 19:56:21 GMT-0700 (MST): resuming generator (121 seconds with no response)
Thu Jan 27 2011 19:54:20 GMT-0700 (MST): resuming generator (121 seconds with no response)
Thu Jan 27 2011 19:52:20 GMT-0700 (MST): resuming generator (121 seconds with no response)
Thu Jan 27 2011 19:50:19 GMT-0700 (MST): resuming generator (120 seconds with no response)
Thu Jan 27 2011 19:48:18 GMT-0700 (MST): resuming generator (120 seconds with no response)
Thu Jan 27 2011 19:46:18 GMT-0700 (MST): resuming generator (121 seconds with no response)
Thu Jan 27 2011 19:44:17 GMT-0700 (MST): resuming generator (121 seconds with no response)

I'm also going to play with the site structure a bit to see if I'm not leading the script down a successful path.  The biggest problem I am having is I cannot tell what the script is doing, when , for how long, the stop button doesn't exist, and I can't turn an incomplete project (that runs all night) into a usable XML file.

OK, I hope that helps.  I'm going to try a few more things from my end.  I'm just not able to see what's going on from a UX standpoint.

Thanks,
Tony
Re: choking, I think
« Reply #12 on: January 28, 2011, 10:45:04 AM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: choking, I think
« Reply #13 on: January 28, 2011, 07:15:52 PM »
Thanks for the suggestions.  I can adjust both of those settings, but have absolutely no idea what they would need to be changed to. 

My current settings are max_execution_time 9000 and memory_limit 512M.

Re: choking, I think
« Reply #14 on: January 28, 2011, 11:09:50 PM »
That would be ok in most cases, make sure that it's defined in PHP configuration (not generator configuration).