• Welcome to Sitemap Generator Forum.
 

Won't write sitemap files when running from cron

Started by jknopp, February 03, 2012, 04:07:16 AM

Previous topic - Next topic

jknopp

Our install of Generator has been working just fine for quite a while... then a few weeks ago it mysteriously stopped working. It would still run everyday and generate normal looking output, and even write sitemap files in the data folder, but wouldn't copy them to the production location or update the broken links list. It worked fine when launched manually from the UI. I tried installing a clean copy and copying over all the settings (via the UI again) but the behaviour is unchanged.

Only possibly odd things I see:
1. Every cron launched crawl starts with "Resuming the last session (last updated: 1970-01-01 00:00:00)"
2. Progressive output looks fine, page count looks good, but ends in
  <div id="percprog"></div>
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
   |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
3. the sess_* files stick around after the run is complete (this may be normap, don't know)

Permissions have been checked, all are good, and again it works fine when launched from UI. Both UI launched crawls and cronjob crawls are run by the same user.

Am I missing something obvious?

XML-Sitemaps Support

Hello,

command line script is running with dirrent user ID usually, so permissions are different ocparing to running in browser.
Try to set all files in generator/data/ folder and all sitemap files to 666 permissions.

jknopp

Not this time. Permissions and ownership are all good. Triple checked it all. Also, remember this was running perfectly for years, and just stopped recently. Nothing has been changed re permissions or users. About the only thing that may have changed (without my knowledge) is a PHP version or similar. Note that I'm not seeing a error message anywhere, about not being able to write etc. Does generator write an error log anywhere (other than the stdout which is emailed to me already)?

XML-Sitemaps Support

Hello,

there is debug log created by generator. You can try to run generator manually in command line and see how it works.

jknopp

Where is the debug log? Do you mean stderr? If so, I should already be getting that along with stdout, but I'll try explicitly directing both output streams to a log file. If the debug log is written to a file somewhere, I can't find it nor any mention of if in the documentation.


jknopp

No dice. Tried running from command line with all output piped to a file, and here is what I get:

Resuming the last session (last updated: 1970-01-01 00:00:00)
1 | 106 | 34.0 | 0:00:02 | 0:03:45 | 1 | 1,318.3 Kb | 1 | 0 | 1318
20 | 87 | 639.1 | 0:00:04 | 0:00:20 | 1 | 1,819.4 Kb | 20 | 570 | 1819
40 | 67 | 1,229.3 | 0:00:07 | 0:00:13 | 1 | 2,046.9 Kb | 38 | 956 | 2046
.
.
.
42220 | 2 | 806,270.9 | 2:14:16 | 0:00:00 | 215 | 44,267.1 Kb | 35568 | 0 | 44267
42240 | 2 | 807,420.5 | 2:14:24 | 0:00:00 | 225 | 44,280.5 Kb | 35578 | 0 | 44280
42244 | 0 | 807,644.7 | 2:14:26 | 0:00:00 | 228 | 44,291.8 Kb | 35581 | 0 | 44291
<h4>Completed</h4>Total pages indexed: 35581
<br>Creating sitemaps...
<div id="percprog"></div>
|  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
|  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
|  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
|  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
<br />Done, redirecting to sitemap view page.
<script>
top.location = 'index.php?op=view'
</script>

There are no error messages. As usual, no sitemap was written.


jknopp

I tried setting a new sitemap output filename (making sure ownership was set up correctly), and the result is the same. Only signs I see of it running are the updated generator.conf, the sess_ file, and the output (with no error message). But the UI shows the Request Date as the last time I ran it manually (ie. from the UI).

jknopp



jknopp

Sorry, it is a production server with confidential client data, so I cannot. But I'm happy to run any tests and send you the output.


jknopp

The cronjob does run as the server user. I can't manually run it as that user however as it is a system user I can't log into. I'm at a loss here. It runs fine and does write some files in the data folder, but nothing actually happens when it is done. Almost seems like it dies right at the end after it has finished crawling.

XML-Sitemaps Support

You can try to run it via web request in command line:
wget "http://www.yourdomain.com/generator/index.php?op=crawlproc&resume=1"