Last-modified date wrong for PDF files
« on: November 17, 2009, 08:44:45 PM »
Hi there,

in my setup, the sitemap generator always reports the current time as the last-modified timestamp for PDF files. Some details:

(1) The sitemap generator is configured to include PDFs in the sitemap, but not to parse them. 
(2) It is set to use the server's response for the last-modified tag.
(3) The server sends the correct HTTP last-modified header, corresponding to the file upload timestamp.

Looks like a bug. Could it be that the sitemap generator doesn't do a HEAD request for files which are not parsed, and then defaults to the current time for the modification tag?

Cheers,

Michael
Re: Last-modified date wrong for PDF files
« Reply #1 on: November 18, 2009, 10:10:29 PM »
Hello,

since it doesn't parse the files, it's not requesting it from server and cannot get last-modification date. You can use "Individual attributes" setting to define lastmod date for them instead.
Re: Last-modified date wrong for PDF files
« Reply #2 on: November 18, 2009, 10:24:49 PM »
Setting the last-modified dates by hand is a good idea for a limited number of files which don't change often - thanks for the suggestion. But it is also a little tedious, so I'd rather turn this into a feature request then ;)

Getting the server headers without retrieving the file can be done by a HEAD request, and perhaps that's the most efficient way for dealing with files which are not parsed.

Cheers,

Michael