XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
August 30, 2008, 12:22:24 AM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
7341 Posts in 1806 Topics by Members
Latest Member: theknownuniverse
Pages: [1]
  Print  
Author Topic: Solutions to Memory Problem  (Read 3351 times)
Kursplat
Registered Customer
Newbie
*
Posts: 6


View Profile
« on: March 17, 2008, 02:11:28 AM »

I am stuck in the same bind that many people trying to use this script are in.  My site is hosted on a shared server at a hosting company.  We are not able to increase the amount of memory available to the process.  Based on the other threads I've seen in these forums, it appears 32k is a standard amount we all get allocated.

I'm looking to do two things: (1) figure out a way I can index my site given the constraints I have and (2) offer suggestions for improving the script so it will work under lower memory constraints.

(1) I bought this script because I want to index more then just 100 pages of my site.  Given the memory constraints, that's about all I get.  If I go another level deeper, I fun out of memory.  If all I get is 100 pages, this script isn't much use to me.  Is it possible to run this script on one of my personal machines that I have control over and can set the memory to whatever I want and have it index a web site that is on a different machine (the site on the hosted server)?  I assume I can.  However, the resulting files will be saved on my local machine, correct?  So, does this mean I will have to write a batch job to call the sitemap script, then when its done ftp the files, and when that's done ping Google?

(2) I haven't looked at the code of the script yet, but I will assume you are not already doing these things.
(2a) Have you considered some sort of compression algorithm to make the URL list take up less space in memory?
(2b) Have you considered adding a config value indicating the max memory the script should use and then if the process requires more then that much memory, it starts reading/writing the urls to disk?  It would take longer, but at least it would work.  Rather then have 1 huge growing file of URLs which would start taking way to long to process to check each url, look at using a bunch of smaller files.  Take the URL and add the ASCII values of each character in the URL, then MOD 1000.  Create/open the file with the name that includes that mathematical result, say tmp743.txt, and read the URLs from the file one at a time to see if any match.  If no match is found, add this URL to the end of the file.  If you combine this with 2a, it will go even faster. Smiley
Logged
admin
Administrator
Hero Member
*****
Posts: 3073


View Profile
« Reply #1 on: March 17, 2008, 10:42:37 PM »

Hello,

if you only get 100 pages crawled and memory limit of 32M is exceeded at that point, possible sitemap generator is trying to download a large file, please PM me your generator URL so that I can check that.

We have plans for further improvements of memory usage for Sitemap Generator targeting low memory server packages.
Logged

Kursplat
Registered Customer
Newbie
*
Posts: 6


View Profile
« Reply #2 on: March 18, 2008, 05:10:44 PM »

With a max depth of 4, the script runs out of memory.  With a max depth of 3, the script completes but only includes approximately 175 pages.
Logged
admin
Administrator
Hero Member
*****
Posts: 3073


View Profile
« Reply #3 on: March 18, 2008, 11:40:53 PM »

Hello,

please PM me your generator URL so that I can check that.
Logged

Kursplat
Registered Customer
Newbie
*
Posts: 6


View Profile
« Reply #4 on: March 19, 2008, 08:02:18 PM »

Well, I don't know exactly what it was failing on, but I played with the settings which included adding some url exclusions and reran it.  I was able to get it to crawl 5000+ pages with a max depth of 8 without erroring out.  Again, I don't know what was wrong and which of my changes fixed it.
Logged
danialcollyer
Registered Customer
Newbie
*
Posts: 6


View Profile
« Reply #5 on: April 02, 2008, 01:21:34 AM »

The answer to your problem is simple. Run the script alone on a home/office server. I have been running the script for months on siteground shared hosting with no luck. It was only when I had read a reply from admin in the forum that I realized the script did not have to be on the same server as the website being crawled.

I have used Xampp on a windows machine, I had to change the php.ini file in about 5 locations to increase max _execution_time and memory. I just reset file locations and names to suit my site in the configuration.

Apart from  taking up a massive amount of memory and time, the script runs sweet! I assume its taking some time to crawl due to limitations from my shared hosts allowed amount of http requests!

I hope this has been helpful to anyone using shared hosting!

Dan
Logged
Simple
Registered Customer
Newbie
*
Posts: 1


View Profile
« Reply #6 on: April 09, 2008, 01:28:07 AM »

I fixed this by opening .htaccess in my root (public_html) folder
And adding "php_value memory_limit 16M" to the bottom and saved it. (WITHOUT " quotes)

Hope that helps. Smiley
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!