A couple of Issues
« on: February 11, 2012, 06:49:10 AM »
1) I'm getting the "Sitemap file is not writable" error on about 100 pages of the sitemap. This is despite the fact that I have set permissions to 666 and the sitemap pages appear to be written just fine.

2.) I have Max Execution Time set to 7200 seconds, or two hours. However, often times I find the generator still running over 10 hours from when it started.
Re: A couple of Issues
« Reply #1 on: February 12, 2012, 09:48:54 AM »
1. What exactly is the error message you get?
2. You need to set maximum time in generator configuration as well. Also, generator automatically resumes the process (it starts again) if you keep the crawling window open.
Re: A couple of Issues
« Reply #2 on: February 12, 2012, 03:26:33 PM »
1.) The error is displayed under the Configuration tab:
Code: [Select]
An error occured

Sitemap file is not writable: /home/user/public_html/sitemap463.html
Sitemap file is not writable: /home/user/public_html/sitemap464.html
(etc, etc., all the way to:)
Sitemap file is not writable: /home/user/public_html/sitemap559.html

As you can see from my sitemap, those pages seem to have been written OK:
[ External links are visible to forum administrators only ]

2.) That is what I'm talking about; I set 7200 seconds as Max Execution Time under Configuration -> Crawler Limitations. I start the crawl via cron at 0100, and when I check on it at 1100 I see that it's still running. I don't want it to run that long, because it bogs the site down the whole time it's running, making it barely navigable to the user, not to mention search engine bots.
« Last Edit: February 12, 2012, 03:28:10 PM by MadDogMike »
Re: A couple of Issues
« Reply #3 on: February 13, 2012, 07:25:36 AM »
Hello,

1. if you have changed the path for HTML sitemap to domain root, you need to manually create empty files with those names (sitemap1.html, etc) in domain root and set 666 permissions for them, otherwise generator will not be able to write to them. Default configuration sets it to store in generator/data/ folder, since this folder is writable.

2. What are the cron settings that you use?
Re: A couple of Issues
« Reply #4 on: February 13, 2012, 02:21:39 PM »
1.) Oh, I thought I just had to create the first page and set 666 to it, then subsequent pages would be written OK.  OK, I got that set.

2.) Here are my cron settings:

Minute   Hour   Day   Month Weekday   Command
0           1   *   *      *   /usr/bin/php /home/user/public_html/generator/runcrawl.php
Re: A couple of Issues
« Reply #5 on: February 13, 2012, 04:24:15 PM »
Can you try to set it to a shorter time for testing, like 60 seconds?
Re: A couple of Issues
« Reply #6 on: February 14, 2012, 03:24:24 AM »
OK, just did that. Set it to 99 second, set the cron to start at 2200 local. It's now 2024 and the crawl is still running.

It wouldn't be such an issue if the crawl didn't take so long to complete. I have another site with about 25k pages and the generator completes it in just over an hour. This site has about 50k pages, so you would think the generator would complete in just over 2 hours. Instead it takes 12+ hours, and I can't figure out why, since both sites are on the same VPS, same CMS (WordPress) and the same generator settings.
« Last Edit: February 14, 2012, 03:25:59 AM by MadDogMike »
Re: A couple of Issues
« Reply #8 on: February 15, 2012, 04:57:28 AM »
Same thing, just keeps running.
Re: A couple of Issues
« Reply #9 on: February 15, 2012, 01:45:27 PM »
Hello,

please let me know your generator URL/login in private message to check this.
Re: A couple of Issues
« Reply #10 on: February 16, 2012, 05:07:17 AM »
PM sent