9.365 encoding

Humanist (mccarty@phoenix.Princeton.EDU)
Sat, 9 Dec 1995 01:34:31 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 365.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)

[1] From: "Steven J. DeRose" <sjd@ebt.com> (94)
Subject: Re: 9.362 encoding &c.

[2] From: Robin Cover <robin@utafll.uta.edu> (31)
Subject: Lancashire on SGML and TEI

Date: Thu, 7 Dec 1995 11:44:52 -0500
From: "Steven J. DeRose" <sjd@ebt.com>
Subject: Re: 9.362 encoding &c.

Martin Mueller <martinmueller@nwu.edu> wrote:
>Some deep belief in 'transcription without loss' underlies SGML/TEI. Texts

I've been to a lot of TEI meetings, and I don't recall anyone suggesting
such a belief. Quite the contrary.

Markup merely helps one to say something explicit about what your
interpretive decisions are, and what you intend to keep and to lose (a
particular editor may achieve their intent more or less well, but that is
hardly unique to this endeavor).

As Morriss Zapp reminds us, "every decoding is another encoding." Even such
a seemingly theory-neutral act as making a photo or a photocopy is
interpretive, and is far from being a 'transcription without loss'. Those
who work with manuscripts often find even the best photos inadequate. They
use high-resolution pictures, changes in lighting and camera angle,
infra-red and ultraviolet photography, and so on to lose somewhat less in
the 'transcription', but ultimately they may be forced to fly out and
examine the original.

The act of typing in text from a manuscript is a huge interpretive act, and
results in tremendous transcription loss. But this is inevitable. "You pays
your money and you takes your choice." If you're a paleographer and need to
keep track of the precise slant of every character, then you choose
different information than would a literary critic, and use very different

The TEI is largely concerned with providing terminology for expressing
interpretive claims -- it leaves the choice of what claims to make to the
scholar. It does not provide exhaustive terminology sufficient for all
interpretive claims one might ever wish to make -- this seems unsurprising.

Let us say scholar A is working on a document by author X. A makes an
interpretation that the manuscript or printed form of the document contains
a scribal or editorial error. How can A clearly express this interpretive
opinion to a reader B? One way is to write it out in prose:

"I think that on page 27 of the 14th edition, X intended 'affect' but the
manuscript says 'effect' for the 5th word on the 27th line"

This is rather formal English -- more formal and more precise than I would
expect to see very often. However, it takes a lot of space to say something
fundamentally simple, and it lies outside the text making it mechanically
difficult for B to associate it. An apparatus criticus addresses these
problems somewhat, by introducing a more formal symbolic notation for such
claims and by putting the claims right in the text, or at least on the same

One can say the same thing "in TEI":

<SIC corr=affect>effect</SIC>

To say this is not to force reader B to accept A's interpretation -- it is
merely to express it more clearly. And, as always, there are levels of
interpretation that still remain implicit. Both the prose and the TEI
expressions leave it implied that the editor has negligible doubt about
identifying the letter 'e' in the physical manuscript (when there is, other
prose or TEI methods are available to say so).

As with paper and English, there are ways in TEI to express the claim
outside of the text pointing in, instead of inline; and (if desired) to
attach claims about level of certainty, alternative views, responsibility
for the opinion, commentary on it, and so on.

Ultimately no statement is completely unambiguous. There may be subtle ways
in which A's notion of "correction" differs from C's. This, too, is nothing
new. Before applying a <SIC> tag A ought to know what they intend, and be
willing to articulate it to the best of their ability.

If A is unwilling to state what their interpretation is using some standard
means, they may instead:

* Decide to make "no" interpretations, thus making only unconscious ones

* Make a choice and not say so (e.g., just change 'effect' to 'affect')

* Say generally "I changed some things where I thought it good"

* Identify choices informally in prose

* Identify choices formally in a non-standard way.

Each of these alternatives may be better than the last, and adequate for
more purposes -- but it seems to me they all compare poorly to stating up
front that one is interpreting, and then (as far as consciousness and
practicality allow), saying what one's interpretations are in a way that
adds as little as possible to the ambiguity already forced upon us by a
non-Platonic world.

>are, in Nelson Goodman's terms, infinitely "allographic." and Michael
>Sperberg McQueen is an "allographer," who would unflinchingly accept a
>spelling of his name as a string of entity references if it had to come to

Yes -- is that not what we are all doing to read this note? We turn an
abstract orthographic notion of 'letter' into the press of a key, then into
a bunch of bits, then into sound waves to go through the phone line, then
back to bits, then to pixels on a screen, then to photons that strike the
retina, then to patterns of neural activity that the reader classifies as
the letter again. There is a legitimate (essentially allographic) sense in
which the information is "the same" throughout, and a legitimate sense in
which it is not. As Pike has often said, no matter what level of analysis
you start at, there are more etic and emic levels above and below it that
you cannot simultaneously analyze. Nothing new here.

Does any of us know what signal our keyboard literally sends when we type
Michael's name? It *could* be sending that "string of entity references":
&cap.m;&i;... and we'd never know it. If some change in technology made
that the most efficient way for computer-makers to do it, would we care?
Mailers don't display 'Michael' as 01001101 01101001 01100011 01101000
01100001 01100101 01101100 -- this is interpretive too: We have a social
convention against expressing information in a way that would be lost by
the transformations/ interpretations just described (sometimes the
convention is troublesome, as for those needing non-Latin-based writing
systems, or concrete poets). Similar but not identical conventions have
always been with us, perhaps due to the nature of human cognitive and
pattern-extraction capabilities.

Steve DeRose

Date: Thu, 7 Dec 95 12:32:38 CST
From: Robin Cover <robin@utafll.uta.edu>
Subject: Lancashire on SGML and TEI

HUMANIST readers who wish to know more about Ian Lancashire's objections
to SGML and TEI may read a paper submitted by him to the Electric Scriptorium
conference. The title of the paper is: "Early Books, RET Encoding Guidelines,
and the Trouble with SGML." The document is dated November 11, 1995.
The URL for the online document is:


The document's initial paragraph supplies something of an abstract
(which has not yet been posted to the UCalgary WWW server):

"Standard Generalized Markup Language (SGML) encodes medieval and
Renaissance manuscripts and printed books with difficulty. This
computer language is an ISO standard, but one acknowledged more in the
breach than in the observance. Here I argue that the
humanities should follow the originators of the World Wide Web, who made
HTML (Hypertext Markup Language), an encoding standard using SGML syntax
but serving purposes alien to the intentions of SGML's creators. The
Text Encoding Initiative (TEI) SGML document-type definition is unusable
for my kind of scholarly editing, and for the editing of early
texts generally. However, the <italic>TEI Guidelines</italic> is an
excellent discussion of tagging, principles and practice, and its
system of over 400 tags is the starting point for anyone interested in
text encoding."

[apologies for making no attempt to imitate the HTML print effects
in 7-bit ASCII for this notice]

Robin Cover Email: robin@utafll.uta.edu ("uta-ef-el-el")
6634 Sarah Drive
Dallas, TX 75236 USA In case of link failure, use:
Tel: (1 214) 296-1783 (h) robin@acadcomp.sil.org
Tel: (1 214) 709-3346 (w)
FAX: (1 214) 709-3380 SGML Page: http://www.sil.org/sgml/sgml.html