Tag search

How to correctly create ODF documents using zip

by Sander Marechal

One of the great advantages of the OpenDocument format is that it is simply a zip file. You can unzip it with any archiver and take a look at the contents, which is a set of XML documents and associated data. Many people are using this feature do create some nifty toolchains. Unzip, make some changes, zip it again and you have a new ODF document. Well… almost.

The OpenDocument Format specification has one little extra restriction when it comes to zip containers: The file called “mimetype” must be at the beginning of the zip file, it must be uncompressed and it must be stored without any additional file attributes. Unfortunately many developers seem to forget this. It is the number one cause of failed documents at Officeshots.org. If the mimetype file is not correctly zipped then it is not possible to programmatically detect the mimetype of the ODF file. And if the mimetype check fails, Officeshots (and possibly other applications) will refuse the document. This problem is compounded because virtually no ODF validator checks the zip container. They only check the contents. In this article I will show you how you can properly create ODF files using zip.

Fixing OpenDocument MIME magic on Linux

by Sander Marechal

When working on the beta of Officeshots.org I ran into an interesting problem with file type and MIME type detection of OpenDocument files. When a user uploads an ODF file to Officeshots I want to determine the MIME type myself using the PHP Fileinfo extension. Windows user who do not have any ODF supporting applications installed will report ODF files as application/zip which is of no use to me. In addition, a malicious user could attempt to upload an executable file and report the MIME type as ODF file.

On Linux, the PHP Fileinfo extension relies on the magic file that is provided by the file package. The magic file contains a series of tests that can determine the file type and MIME type of a file by its contents. I found out that the magic file is incomplete for OpenDocument files. Below I will show you what is wrong with the magic file and how you can fix it.

Update 2009-06-29: I have now also created a patch against the original upstream file-5.0.3.