Errors After Crawl Finishes
« on: February 08, 2011, 08:33:02 PM »
I get these errors after crawl finishes.

I changed hosts to a VPS and it got through it but then it spit out the following:

Error writing to these files:
/generator/data/sitemap.html
/generator/data/sitemap2.html
/generator/data/sitemap3.html
/generator/data/sitemap4.html
/generator/data/sitemap5.html
/generator/data/sitemap6.html
/generator/data/sitemap7.html
/generator/data/sitemap8.html
/generator/data/sitemap9.html
/generator/data/sitemap10.html
/generator/data/sitemap11.html
/generator/data/sitemap12.html
/generator/data/sitemap13.h


All the files that the setup guide points out are 777 or 666.   Also, nothing is writing to my Sitemap.xml file either.
Re: Errors After Crawl Finishes
« Reply #1 on: February 09, 2011, 12:20:43 AM »
It is also not letting me change the Save Sitemap to: entry.  It always reverts back to the original.
Re: Errors After Crawl Finishes
« Reply #2 on: February 09, 2011, 01:28:54 PM »
Hello,

> Error writing to these files:

Are the files actually created? (i.e. can you see them via ftp)
Re: Errors After Crawl Finishes
« Reply #3 on: February 09, 2011, 06:19:25 PM »
No they werent.  I created them though and made them writeable hoping it would help.  I don't know enough about servers to be on the virtual dedicated server I had so I'm back on the shared and bumped the max_execution_time to 120 and it wont run through.  The program is basically useless to me right now because I cant get it to run through even at a max depth of 3.  Ideally I'd like it to be a max depth of 4.  Suggestions?
Re: Errors After Crawl Finishes
« Reply #4 on: February 09, 2011, 08:33:45 PM »
Can you provide a temporayr ftp access for troubleshooting? The issue with "Error writing to these files:" could be easily resolved I think.
Re: Errors After Crawl Finishes
« Reply #5 on: February 10, 2011, 05:04:47 PM »
I am having a similar problem with our installation as well.  The generator "finishes", puts the sitemap files in the /data directory, but then gives an error about  how it has permissions problems. 

Is there a way to fix this?  I have no idea what it's trying to do right at the end, but something is amiss.  The big problem is that I can't tell it to start a new one from the command line, it just keeps trying to "finish" the previous run.

Here's the snip from my SSH window (with some paths changed to protect the innocent):


Code: [Select]
[root@server data]# /usr/bin/php /var/www/www.mysite.com/utils/xml-sitemap-generator/runcrawl.php
Resuming the last session (last updated: 2011-02-07 16:15:10)
<h4>Completed</h4>Total pages indexed: 2415
<br>Creating sitemaps...
<div id="percprog"></div>
<h4>An error occured: </h4>
<script>
top.location = 'index.php?op=config&errmsg=Error+writing+to+these+files%3A%3Cbr+%2F%3E%0A%3Cb%3E%2Fvar%2Fwww%2Fwww.mysite.com%2Futils%2Fxml-sitemap-generator%2Fdata%2Fsitemap.xml%3Cbr+%2F%3E%0A%2Fvar%2Fwww%2Fwww.mysite.com%2Futils%2Fxml-sitemap-generator%2Fdata%2Fsitemap_images.xml%3C%2Fb%3E%3Cbr+%2F%3E%0APlease+correct+files+permissions+and+resume+sitemap+creation.'
</script>
[root@server data]# ls -la
total 2524
drwxrwxrwx 2 root   root    4096 Feb 10 09:41 .
drwxr-xr-x 4 root   root    4096 Feb  4 13:33 ..
-rwxrwxrwx 1 apache apache   1581 Feb  4 13:58 2011-02-04 20-58-41.log
-rwxrwxrwx 1 apache apache   1580 Feb  4 14:03 2011-02-04 21-03-07.log
-rwxrwxrwx 1 apache apache  49424 Feb  4 15:12 2011-02-04 22-12-19.log
-rwxrwxrwx 1 apache apache 223335 Feb  4 15:16 2011-02-04 22-12-19.proc
-rwxrwxrwx 1 apache apache  49752 Feb  4 15:25 2011-02-04 22-25-00.log
-rwxrwxrwx 1 apache apache  49751 Feb  4 15:58 2011-02-04 22-58-17.log
-rwxrwxrwx 1 apache apache  49629 Feb  7 09:15 2011-02-07 16-15-15.log
-rwxrwxrwx 1 root   root    49621 Feb 10 09:35 2011-02-10 16-35-02.log
-rwxrwxrwx 1 root   root    49621 Feb 10 09:37 2011-02-10 16-37-28.log
-rw-rw-rw- 1 root   root    49621 Feb 10 09:41 2011-02-10 16-41-09.log
-rwxrwxrwx 1 apache apache 876952 Feb  7 09:15 crawl_dump.log
-rwxrwxrwx 1 apache apache   3929 Feb 10 09:33 generator.conf
-rw------- 1 root   root       13 Feb 10 09:41 sess_1enlpiaeu4fllvhup4kupv6lt4
-rwxrwxrwx 1 apache apache     13 Feb  7 09:26 sess_2fs8rtgj2tnsak00aa0qfr4va0
-rwxrwxrwx 1 apache apache     13 Feb  4 16:12 sess_3uj0s6jr4vr6g2g0qufvo1oj46
-rwxrwxrwx 1 root   root       13 Feb 10 09:37 sess_6kdl9jsi03toe6ah2fg85870g4
-rwxrwxrwx 1 apache apache     13 Feb  5 19:21 sess_g21835fq7n4glr5gl6pa7k0fm7
-rwxrwxrwx 1 apache apache     13 Feb 10 09:33 sess_gumejmu03umif7tvcnv2l7d4q5
-rwxrwxrwx 1 root   root       13 Feb 10 09:35 sess_ilelmrujcv1cgm9lqj0flcgp55
-rw-rw-rw- 1 root   root   366164 Feb 10 09:41 sitemap_images.xml
-rw-rw-rw- 1 root   root   444612 Feb 10 09:41 sitemap.xml
-rw-rw-rw- 1 root   root   190485 Feb 10 09:41 urllist.txt
[root@server data]#
Re: Errors After Crawl Finishes
« Reply #6 on: February 10, 2011, 05:28:03 PM »
For additional information, I even tried just clearing all the process-generated files out of the data directory and then starting over.  I also tried changing the owner of the directory to "apache".  I was able to run the generator and could see it spitting out its debug info as it went along. 

But when it got to the end, I got the same message about bad permissions.  Yet the sitemaps are there, sitting in the directory.

There must be some post-processing permissions update that is trying to run that's causing it to error out.

Any ideas?

Thanks!
Re: Errors After Crawl Finishes
« Reply #7 on: February 10, 2011, 10:11:40 PM »
Make sure that you created empty sitemap.xml file in domain root and set 0666 permissions for it.
Re: Errors After Crawl Finishes
« Reply #8 on: February 11, 2011, 04:40:03 PM »
Why would I need to have an empty sitemap.xml file in the domain root?  From what I can tell, the program doesn't ever ask for where my domain root is.

For reference, I do not have the generator putting the sitemap in my domain root automatically.  I am choosing to generate it somewhere else (in the /data folder) and then move it into place with another script.  I *do* have a sitemap.xml in the domain root, it's has 666 permissions, and the owner is "apache" (which is the same owner/group that PHP runs as).  But I've moved that into place on my own.

It seems so weird that the process actually FINISHES, creates the sitemaps, but then still reports an error.

Is there anything else that could be causing these kind of permission problems? 

Thanks!


Re: Errors After Crawl Finishes
« Reply #9 on: February 11, 2011, 08:39:40 PM »
Creating the file is *required* according to installation instructions: https://www.xml-sitemaps.com/howto-install.html
Aslo, you should not specify data/ folder as a target for sitemap.xml since it's an internal storage folder of sitemap generator.
Re: Errors After Crawl Finishes
« Reply #10 on: February 11, 2011, 09:12:27 PM »
Ah ha!  That was it!  Once I specified a different directory for the output, it worked perfectly.

THANK YOU!