A few months ago I wrote about digitizing the worlds collection of data. In particular I commented about how publishers and content providers are providing less than pristine copies of their work. Usually from scanning the final printed version and creating a PDF document that also contains the visual effects of paper. While I advocate Adobe’s PDF standard, I also realize that it is not the only format available to use. I have seen the TIFF image standard used in some cases and in other cases HTML. The reason I choose Adobe’s standard was three fold.
First is Security. Once a PDF document is created it can not be edited or mangled by the average user. In fact without third party applications it is quite difficult to edit an existing PDF. This allows peace of mind to the millions of content developers who wish to release their documents.
Second is Readability. The PDF file standard is supported natively by Mac OS X and I believe future versions of Windows. For those operating systems that do not have built in support for PDF, there are “readers” available from Adobe for free. So anyone can view a document without incurring any cost. But it will require additional software.
Third is functionality. Within a PDF document you are able to link external documents such as webpage’s. Also, you can also create internal links for chapters allowing a reader to skip large portions of a document to a particular point (Typically predefined). Also elaborate document structure is available for more advanced document management.
This is just a quick reference to what features PDFs support and why I currently support Adobe’s technology over other formats for permanent document storage.
I would also like to advocate the use of HTML or XML for documents. Unlike Adobe’s PDF system where you have to purchase software to create a PDF. HTML, and other XML, standards are openly available to use. HTML is also the default standard for all online documents, but it fails to address the security issues that Adobe does. Conversely, PDFs do not allow users to work with the text of a document with the ease that HTML does. Finally HTML can support animations, dynamic content, and has universal support.
So which standard do you use? That would depend on your requirements as a content producer, and many angels would have to be looked at before deciding which format to deal with. Ideally content would be provided in both formats and available in parallel, giving the users control on how content is consumed.
Tomorrow I hope to post a piece on how to manage long term storage of a digital work.