Humanist Discussion Group, Vol. 15, No. 193.
Centre for Computing in the Humanities, King's College London
Date: Wed, 22 Aug 2001 09:26:49 +0100
From: Wendell Piez <firstname.lastname@example.org>
Subject: Re: 15.183 Leyton's book? publishers and XML?
At 03:29 AM 8/17/01, you wrote:
>I'm trying to get an idea of the extent to which properly-digitized
>documents -- XML documents, really, using DTDs based on TEI standards --
>are acceptable to academic publishers.
I can't say anything about academic publishers, but I can say that XML is
being increasingly used (as SGML has actually long been used, at least in
some places) by larger publishers for their own editorial and
editorial-to-production processes. Generally they are not at the point
where they expect, or even have made provision for, authoring directly in
markup. (The exception to this is such things as reference books and other
kinds of publication where authoring is very much subordinate to an
editorial process. But these folks are *not* accepting arbitrary markup:
they'll mandate the DTD themselves.)
In the high-tech publishing market, for example such publishers as O'Reilly
(I can name names where I don't actually have specific knowledge covered by
an NDA :-), support for authors who want to use markup is definitely on the
rise: but it is a slow process.
Note that publishers are wary of exposing the technologies of their
internal processes to outsiders, since this means competitors can get a
look. Thus, for example, mandating a DTD (even a public DTD such as DocBook
or TEI) might be seen as "exposing" a little more of their business
processes than they like. Oddly, a semantically opaque format such as Word
is actually a feature to them from this point of view. They expect to
change the encoding in any case. (While this may be less true in the
academic publishing sector, there is also less money there for the
necessary engineering -- both technical and social -- to support markup
from authoring through editorial stages.)
If a publisher does use XML internally, chances are it's not TEI, which is
not sufficiently constraining to be worth a whole lot to them. If their
markup is anything like TEI, it'll be a highly constrained subset, probably
not validating to P3 but to their own derived version. There are good
reasons for this. If I was an editor or production manager for a press, I
would be skeptical of any author who wanted me to process TEI -- since I
know how much engineering that requires. I would say "hey, markup, great!",
but then would want to see the most constrained TEI subset to which they
conform. Given that "TEI" might almost as well mean "kitchen sink" in this
context, doing the necessary analysis to understand their TEI (out of the
universe of possible TEI), then write post-DTD validators, stylesheets etc.
to process it into something useful to me, would almost certainly be more
expensive than stripping their tagging and starting fresh. (Especially
given who I'd have to pay to do these respective jobs. If the volume were
high enough, it could be worth it, since economies of scale could kick in.
But for one book?)
I'd feel better (I'd be celebrating!) if the author said "it's TEI, but
tell me what markup to target and I'll write the stylesheets myself" --
which some authors are now able to do. But then they're not giving me TEI,
Markup pays for itself very quickly as it scales up. But TEI, in itself, is
not sufficiently constrained to scale very well. (DocBook is somewhat
better, and as I said I can see some niche publishers like O'Reilly working
towards DocBook support.) TEI is excellent for supporting a wide range of
scholarly research purposes. But there is a direct tradeoff between the
breadth of this range, and the requirements of a production line.
> Are there many publishers yet who
>would accept (for example) the text of a book for publication in XML
>format? How many are still insisting on camera-ready copy, MSWord
>documents, PDFs etc? How many academic publishers are doing e-publishing,
>and what document formats are they using?
My guess is that you'll find things all over the map. Academic publishers
continue to experiment with e-publishing, but it will almost always be in
"bespoke" formats (i.e. custom-engineered markup systems) including
varieties of XML (including Open E-book) and even HTML.
>All insights and relevant experiences much appreciated -- please name names
>if you can. I'll be happy to summarize responses to the list.
Can't name names 'cause of those NDAs ... but as to academic publishers
specifically, I don't speak from firsthand knowledge (haven't worked with
any), but rather from my assessment of the current state of the
technologies in the context of editorial and production work.
I hope the perspective sheds some light, in any case.
Wendell Piez mailto:email@example.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML
This archive was generated by hypermail 2b30 : Wed Aug 22 2001 - 05:11:50 EDT