Tag search

Officeshots at the 2010 Document Freedom Day

Just a quick update: I will be giving a presentation about all the new features in Officeshots at the 2010 Document Freedom Day in Baarn in The Netherlands. I will be updating the audience about the progress made since my presentation last year at the DFD. I am not sure exactly what time I will speak, but it will be taped for those who cannot be there.

How to correctly create ODF documents using zip

by Sander Marechal

One of the great advantages of the OpenDocument format is that it is simply a zip file. You can unzip it with any archiver and take a look at the contents, which is a set of XML documents and associated data. Many people are using this feature do create some nifty toolchains. Unzip, make some changes, zip it again and you have a new ODF document. Well… almost.

The OpenDocument Format specification has one little extra restriction when it comes to zip containers: The file called “mimetype” must be at the beginning of the zip file, it must be uncompressed and it must be stored without any additional file attributes. Unfortunately many developers seem to forget this. It is the number one cause of failed documents at Officeshots.org. If the mimetype file is not correctly zipped then it is not possible to programmatically detect the mimetype of the ODF file. And if the mimetype check fails, Officeshots (and possibly other applications) will refuse the document. This problem is compounded because virtually no ODF validator checks the zip container. They only check the contents. In this article I will show you how you can properly create ODF files using zip.

New Officeshots feature: ODF Anonymiser

by Sander Marechal

I have just released a new feature for Officeshots: The ODF anonymiser. The ODF Anonymiser tries to make your document completely anonymous while maintaining it's overall structure. All metadata is removed or cleaned. All text in the document is replaces with gibberish text that has approximately the same word length and word distribution. All images are replaced with placeholder images. All unknown content is removed.

The result of the anonymiser is a document that has the same general structure but with made-up contents. If your original document does not work in a certain application, the anonymised version of the document should fail in the same manner. By using the anonymiser you can test your private documents without exposing the contents to our rendering clients.

New Officeshots feature: ODF validators

by Sander Marechal

I am happy to announce an exciting new feature for Officeshots: Integrated ODF validators.

Every ODF document that is uploaded is run through several different ODF validators. If the converted documents are also ODF documents (when you are testing ODF round trips) then those results are also passed through these ODF validators.

The results of the validators are made available on the request overview, the individual result pages and inside the galleries. Galleries now not only show all attached documents but also all results and a summary of the validator results. This way it becomes really easy to see which documents failed.

Help translate Officeshots in your language

by Sander Marechal

I have finished setting up the internationalisation and localisation frameworks for Officeshots. If you want, you can now help to translate Officeshots to your own language. Translating Officeshots can be done through our Pootle installation.

At the moment there are almost no languages configured yet in Pootle. The reason is that the CakePHP framework on which Officeshots runs has a different locale structure than what Pootle expects. This means I need to add every language by hand. If you want to start working on a new language, please post to the Officeshots mailinglist and I will add the language to Pootle and to Officeshots.

Scanning files with ClamAV from CakePHP

by Sander Marechal

One of the requirements for the upcoming public release of Officeshots.org is that all uploaded files are run through a virus scanner before they are made available. Picking a virus scanner for this job was easy. ClamAV is open source, well supported, actively maintained and comes pre-packaged for Debian Lenny which we use for the Officeshots servers. Finding a PHP library to interact with ClamAV proved harder though. The 3rd party library page for ClamAV points to two different libraries that provide PHP bindings for ClamAV but both appear to be dead and expunged from the internet. So, I created my own using the clamd TCP API, and because Officeshots is built using CakePHP I implemented it as a Cake plugin.

You can download the clamd-0.1.tar.gz plugin or check out the source from my Subversion repository with the following command:

~$ svn checkout https://svn.jejik.com/cakephp/plugins/clamd/trunk clamd

Or you can browse the repository online. In the rest of this article I will show you how you can use this plugin.

Fixing OpenDocument MIME magic on Linux

by Sander Marechal

When working on the beta of Officeshots.org I ran into an interesting problem with file type and MIME type detection of OpenDocument files. When a user uploads an ODF file to Officeshots I want to determine the MIME type myself using the PHP Fileinfo extension. Windows user who do not have any ODF supporting applications installed will report ODF files as application/zip which is of no use to me. In addition, a malicious user could attempt to upload an executable file and report the MIME type as ODF file.

On Linux, the PHP Fileinfo extension relies on the magic file that is provided by the file package. The magic file contains a series of tests that can determine the file type and MIME type of a file by its contents. I found out that the magic file is incomplete for OpenDocument files. Below I will show you what is wrong with the magic file and how you can fix it.

Update 2009-06-29: I have now also created a patch against the original upstream file-5.0.3.

Officeshots.org available in closed beta

by Sander Marechal

Officeshots.org has finally gone into Beta this week. It took a lot more work (and time) than expected but we made it nonetheless. At the moment the beta is a closed beta, available to current contributers and members of the OpenDoc society. But we hope to start with public, free availability within a month. Joining the OpenDoc society is free for FOSS projects, so if you are interested in the beta, please join them.

Read more for the full press release.

Officeshots.org announcement

by Sander Marechal

Yesterday the OpenDoc Society, the NoiV (Netherlands in Open Connection) and the NLNet Foundation announced Officeshots.org, a new webservice where you can upload ODF documents and compare their rendering and output in different office suite applications. We here at Lone Wolves are happy to announce that we are the lead architects of this new webservice.

Over the coming days I will announce a couple of things regarding Officeshots.org on this website, like how it works, where to get the code and how to contribute. The plan is to start a closed beta by the end of February and go public by the end of March, but if we want to make this deadline then we need contributers. In the upcoming days I will explain exactly what we need, but if you want to help then you can already join the officeshots.org mailinglist.