18.329 value of PDF

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Wed, 3 Nov 2004 06:36:30 +0000

               Humanist Discussion Group, Vol. 18, No. 329.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

   [1] From: Martin Holmes <mholmes_at_uvic.ca> (31)
         Subject: Re: 18.321 value of PDF?

   [2] From: "Malcolm Hayward" <mhayward_at_auxmail.iup.edu> (8)
         Subject: Re: 18.326 value of PDF

   [3] From: Alexandre Enkerli <aenkerli_at_indiana.edu> (78)
         Subject: Generated PDF

         Date: Tue, 02 Nov 2004 07:25:47 +0000
         From: Martin Holmes <mholmes_at_uvic.ca>
         Subject: Re: 18.321 value of PDF?

Hi there,

At 12:48 AM 30/10/2004, you wrote:

> >
>In a strikingly unrelated context, someone mentioned their aversion for PDF
>files. Given that PDF files are rather prominent in academic computing,
>what are your thoughts on the subject?

I share this deep dislike of PDF. Mainly it's because this type of document
by its nature mixes data and display in such a way that it's very difficult
to disentangle them; this seems an awful step backwards, making the data
less accessible, and imposing one particular view or layout on the reader
(as opposed to XHTML, for example, which I can choose to display any way I
like in my browser, using custom stylesheets, text size settings, etc., and
which I can reorganize and manipulate very usefully with XSLT). PDF is
really a set of printing instructions, and should only exist between your
computer and your printer.

Oh, and I hate the Adobe viewer even more than I hate the file format. It
takes ages to start up, it's bloated, it's prone to crashing (and taking
the browser down with it, if it's acting as a viewer in the browser
context), and it keeps annoying me with offers of updates to programs I
want even less than the viewer itself.


Martin Holmes
University of Victoria Humanities Computing and Media Centre

         Date: Wed, 03 Nov 2004 06:24:15 +0000
         From: "Malcolm Hayward" <mhayward_at_auxmail.iup.edu>
         Subject: Re: 18.326 value of PDF

A word on the usefulness of PDF. As a journal editor (Studies in the
Humanities), I set type in postscript, converted to PDF, and uploaded those
PDF files to my printer (the "real" printer, that is, the firm that printed
the journal). I then sent the PDF files of individual essays to the authors
so that they could run off-prints of their articles (if they so desired),
reproducing exactly the format in the printed journal. I think this mirrors
what others have said: if you want/need a copy that exactly repkicates a
print copy, then PDF has a use. Malcolm Hayward

         Date: Wed, 03 Nov 2004 06:25:04 +0000
         From: Alexandre Enkerli <aenkerli_at_indiana.edu>
         Subject: Generated PDF

Insightful replies on the value of PDF files. Interestingly, several
comments seem to relate more to the way PDF files are produced as opposed
to the file format itself. The reason they can be distinguished is that
there *are* different ways to produce PDF files and while many document
creators may not use some of the most useful features of PDF files, these
features still exist in the format itself...
As Lisa Spangenberg notes, PDF files are often generated incorrectly and
this might explain many complaints about PDF files.

Not to defend PDF files too much, but it might be interesting to think
about the issue in terms of the file format's potential.
It does seem possible to leave aside issues related to the Acrobat family
of products because the PDF file format can be adopted without any need for
Acrobat software.
Granted, Adobe may not like the way things are done and has kept the PDF
format fairly proprietary. Yet PDF files, like RTF files, can now be used
to do things they're original designers may not have dreamed of.

Stewart Arnell says:
>In our office we tend to create documents in XML and then apply XSLT
>transformations to generate PDF when needed. The XML files capture
>features that PDF is blind to, and PDF is given a sensible place as an
>output medium only.
Great example! Do you happen to have files generated this way that you
would be allowed to distribute? It might give people good ideas on how to
generate PDF files.
It seems clear that, in terms of "workflow," PDF files are only meant as an
output format, whether the output is a typical screen, a handheld screen,
or a printer. Some creators seem to confuse those different output options
and try to generate a single file that will be interesting on-screen and
will print well. These two sets of requirements are rarely compatible.

One PDF-producing system which clearly distinguishes screen and printer
outputs is ConTeXt, based on Donald Knuth's well-known TeX typesetting system.
These files can be generated directly from XML in both screen and printer
formats and there is still a large amount of flexibility in how the files
can be processed.

In fact, the TeX environment is often used to generate interesting PDF files:
Obviously, many of the advantages of PDF files generated through TeX has to
do with TeX itself. In a way, PDF files simply let the beauty of TeX shine
through. While some people are probably "nostalgic" of the time when TeX
was mostly used to generate DVI and PS files, PDF files seem to have become
very popular among TeX users.
LaTeX is a well-known set of macros to produce structured documents in TeX.
A simple but very convenient feature of the way LaTeX is used to generate
PDF files is that sectioning ("Chapter," "Section," "Subsection"...) can be
reproduced as a set of bookmarks in the PDF output. Obviously, links and
anchors are generated directly from cross-references and such.

To briefly respond to some of the criticisms. File size is often an issue
because of font use. Simple documents typeset in Times may be between 6k
and 25k for three-pages printable output or nine pages PowerPoint-like
presentation. A 13-page document generated from Word can be 27k. Even with
IPA characters and/or graphics, PDF files can be made to be relatively small.

The issue of output quality (screen or print) often has more to do with the
way PDF files have been made but it's true that the format itself has some
imperfections in these respects. In comparison to other formats, however,
PDF files are quite a step further in the direction of producing consistent
output in different conditions (operating systems, screens, printers....).

Hugh Cayless mentions PDF's lack of validation as a potential issue. While
this is a very interesting argument, the practical implications are
relatively limited. True, some PDF files contain invalid portions and this
might cause an error while reading the file. These problems typically occur
with PDF files that have been created through non-validating tools. When
PDF files are generated through a markup or typesetting language, the
source's validity is being checked and these problems are rather unlikely
to happen. At least, they seem uncommon. It's still an important point and
perhaps a validation mechanism can be added to some PDF-generating tools.

Having said all of this, the PDF file format is not an "end-all solution"
at the present time and it can clearly be improved. But this academic world
in which many articles are distributed as PDF files, it might be important
to think about the implication of the file format.

Thank you for all the interesting replies!

Alex Enkerli, Teaching Fellow, Visiting Lecturer
Department of Sociology and Anthropology, Indiana University South Bend, DW
1700 Mishawaka Ave., South Bend, IN 46634-7111
Office: (574)520-4102
Fax: (574)520-5031 (to: Enkerli, Anthropology)
Received on Wed Nov 03 2004 - 01:44:24 EST

This archive was generated by hypermail 2.2.0 : Wed Nov 03 2004 - 01:44:26 EST