3.1231 scanning and e-texts (86)

Willard McCarty (MCCARTY@vm.epas.utoronto.ca)
Wed, 28 Mar 90 19:07:29 EST

Humanist Discussion Group, Vol. 3, No. 1231. Wednesday, 28 Mar 1990.

(1) DATE: 28 MAR 90 12:04 CET (45 lines)
FROM: A400101@DM0LRZ01
SUBJECT: Scanning, e-texts, etc.

(2) Date: 28 March 1990, 10:40:58 EDT (21 lines)
Subject: Grass-roots text-entry and scanning

(1) --------------------------------------------------------------------
DATE: 28 MAR 90 12:04 CET
FROM: A400101@DM0LRZ01
SUBJECT: Scanning, e-texts, etc.

A few thoughts on scanning and e-texts:
1. I think recopying is more of a problem than it's been made out to be.
Even if you assume simple forms of storage like sequential files with
markup, you're still going to have shift enormous numbers of bytes around
every few years, and verify that they have been shifted properly. And that's
assuming that people in the 2020s are still going to be happy with (and be
able to use!) the retrieval mechanisms of the 1990s. A case in point:
apparently the geophysical satellite data of the 1960s and 1970s has not only
hardly been evaluated, it is also now mainly stored on mag. tapes in formats
only partly known and possibly no longer readable - even assuming that
mag. tape (or other forms of storage) remains readable indefinitely without
being freshened up in some way. So far, only books are known to have this
property (even here, with a few reservations.)
2. Preserving publisher's e-texts is also non-trivial (I've been doing it
for our editions here for about a year now). Even if your typesetter uses
an internal 8-bit coding with markup, you've still got to translate back
to ASCII and decide what to strip out and what to leave as markup. And this
is the easy end; try translating output files from TROFF mark I back to
ASCII, for example! Moreover, many typesetters can only cope with small bite-
sized pieces, so that a 400 page book consists of perhaps 200 8-10K files
which have to be stuck together again. All this means man-power and checking.
And you can't necessarily get round this by preserving the text as sent
to the typesetter (marked up with TROFF/TEX/SGML or whatever), because
what the typesetter makes of it (including page and line breaks) is part
of the final information as used by readers of the printed book!. I don't
mean to argue that we shouldn't be preserving; of course we should, but
it doesn't come for free either, if it's to be useful.
3. The correspondent (the HUMANIST correspondence is being printed out at
the moment and I don't have his name) who said that an e-text must include
_everything_ significant about the original is I think mistaken in theory
as well as in practice. An e-text is a kind of edition; and an edition is
not and cannot be reality. It's a selective representation of reality,
emphasizing what is important (determined by the collective subjectivity
of the academics working at the time). Sure, that changes - but there's no
point in trying to cater for everything. Leave it to the users of the edition
or of the e-text to put in the extras, and concentrate on getting the text
pretty well right (though I agree that plates, diagrams, and all the rest
present hideous problems which I'm damn glad I don't myself even have to
think about coping with).
Timothy Reuter, Monumenta Germaniae Historica

(2) --------------------------------------------------------------24----
Date: 28 March 1990, 10:40:58 EDT
Subject: Grass-roots text-entry and scanning

I agree fully with the point that original entry and then stages of
correction of electronic texts should be carefully documented. The
electronic equivalent of a fill-in-the-blanks form should be sent to
anyone entering text, say, for the Oxford Text Archive, on which the
text-entry person or encoder should be asked to give edition, state,
even individual shelf-copy he or she is using as copy-text. The reviser
should be asked to name his or her copy-text as well, the date the
corrections are made, and probably the revisor should be required to
list each correction in turn, as they were made, as well as making them
silently within the text. Any suggestions for this list of
instructions? I would join Michael Hart in believing in the grass-roots
entry of text (sometimes the best labor is free labor). People who love
a text, as in *Farenheit 451*, would make it their own by copying it for
posterity. Just using the labor force we have on Humanist to produce
the favorit text of each subscriber might give a good start on the
Library of Congress, or at least a Great Books series! Cheers, Roy