Won't write sitemap files when running from cron
« on: February 03, 2012, 04:07:16 AM »
Our install of Generator has been working just fine for quite a while... then a few weeks ago it mysteriously stopped working. It would still run everyday and generate normal looking output, and even write sitemap files in the data folder, but wouldn't copy them to the production location or update the broken links list. It worked fine when launched manually from the UI. I tried installing a clean copy and copying over all the settings (via the UI again) but the behaviour is unchanged.

Only possibly odd things I see:
1. Every cron launched crawl starts with "Resuming the last session (last updated: 1970-01-01 00:00:00)"
2. Progressive output looks fine, page count looks good, but ends in
  <div id="percprog"></div>
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
3. the sess_* files stick around after the run is complete (this may be normap, don't know)

Permissions have been checked, all are good, and again it works fine when launched from UI. Both UI launched crawls and cronjob crawls are run by the same user.

Am I missing something obvious?
Re: Won't write sitemap files when running from cron
« Reply #1 on: February 03, 2012, 11:23:53 AM »
Hello,

command line script is running with dirrent user ID usually, so permissions are different ocparing to running in browser.
Try to set all files in generator/data/ folder and all sitemap files to 666 permissions.
Re: Won't write sitemap files when running from cron
« Reply #2 on: February 03, 2012, 02:53:06 PM »
Not this time. Permissions and ownership are all good. Triple checked it all. Also, remember this was running perfectly for years, and just stopped recently. Nothing has been changed re permissions or users. About the only thing that may have changed (without my knowledge) is a PHP version or similar. Note that I'm not seeing a error message anywhere, about not being able to write etc. Does generator write an error log anywhere (other than the stdout which is emailed to me already)?
Re: Won't write sitemap files when running from cron
« Reply #3 on: February 05, 2012, 09:41:16 AM »
Hello,

there is debug log created by generator. You can try to run generator manually in command line and see how it works.
Re: Won't write sitemap files when running from cron
« Reply #4 on: February 05, 2012, 03:36:41 PM »
Where is the debug log? Do you mean stderr? If so, I should already be getting that along with stdout, but I'll try explicitly directing both output streams to a log file. If the debug log is written to a file somewhere, I can't find it nor any mention of if in the documentation.
Re: Won't write sitemap files when running from cron
« Reply #6 on: February 06, 2012, 07:52:12 PM »
No dice. Tried running from command line with all output piped to a file, and here is what I get:

Resuming the last session (last updated: 1970-01-01 00:00:00)
1 | 106 | 34.0 | 0:00:02 | 0:03:45 | 1 | 1,318.3 Kb | 1 | 0 | 1318
20 | 87 | 639.1 | 0:00:04 | 0:00:20 | 1 | 1,819.4 Kb | 20 | 570 | 1819
40 | 67 | 1,229.3 | 0:00:07 | 0:00:13 | 1 | 2,046.9 Kb | 38 | 956 | 2046
.
.
.
42220 | 2 | 806,270.9 | 2:14:16 | 0:00:00 | 215 | 44,267.1 Kb | 35568 | 0 | 44267
42240 | 2 | 807,420.5 | 2:14:24 | 0:00:00 | 225 | 44,280.5 Kb | 35578 | 0 | 44280
42244 | 0 | 807,644.7 | 2:14:26 | 0:00:00 | 228 | 44,291.8 Kb | 35581 | 0 | 44291
<h4>Completed</h4>Total pages indexed: 35581
<br>Creating sitemaps...
<div id="percprog"></div>
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
<br />Done, redirecting to sitemap view page.
<script>
top.location = 'index.php?op=view'
</script>

There are no error messages. As usual, no sitemap was written.
Re: Won't write sitemap files when running from cron
« Reply #8 on: February 09, 2012, 03:33:06 PM »
I tried setting a new sitemap output filename (making sure ownership was set up correctly), and the result is the same. Only signs I see of it running are the updated generator.conf, the sess_ file, and the output (with no error message). But the UI shows the Request Date as the last time I ran it manually (ie. from the UI).
Re: Won't write sitemap files when running from cron
« Reply #9 on: February 21, 2012, 07:17:42 AM »
Not working yet. Any other ideas?
Re: Won't write sitemap files when running from cron
« Reply #11 on: February 21, 2012, 04:51:12 PM »
Sorry, it is a production server with confidential client data, so I cannot. But I'm happy to run any tests and send you the output.
Re: Won't write sitemap files when running from cron
« Reply #12 on: February 22, 2012, 08:14:56 AM »
Try to run generator in command line with the same user ID used to run web server processes.
Re: Won't write sitemap files when running from cron
« Reply #13 on: March 01, 2012, 07:31:50 PM »
The cronjob does run as the server user. I can't manually run it as that user however as it is a system user I can't log into. I'm at a loss here. It runs fine and does write some files in the data folder, but nothing actually happens when it is done. Almost seems like it dies right at the end after it has finished crawling.
Re: Won't write sitemap files when running from cron
« Reply #14 on: March 01, 2012, 08:04:54 PM »
You can try to run it via web request in command line:
Code: [Select]
wget "http://www.yourdomain.com/generator/index.php?op=crawlproc&resume=1"