Home About Subscribe Search Member Area

Humanist Discussion Group

< Back to Volume 32

Humanist Archives: Feb. 16, 2019, 6:16 a.m. Humanist 32.463 - the McGann-Renear debate

                  Humanist Discussion Group, Vol. 32, No. 463.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                Submit to: humanist@dhhumanist.org

    [1]    From: C. M. Sperberg-McQueen 
           Subject: XML and web services (36)

    [2]    From: Hugh Cayless 
           Subject: Re: [Humanist] 32.452: the McGann-Renear debate (120)

        Date: 2019-02-16 06:08:49+00:00
        From: C. M. Sperberg-McQueen 
        Subject: XML and web services

Desmond Schmidt says that when he said

     XML was invented by IBM and Microsoft, through the organ of the
     W3C, to serve the needs of web services. Document processing was
     very much a sideline.

what he meant was that "the overwhelming use of XML in its heyday was
for Web Services, not as a document format".

I wish him better luck, in future, in the difficult task of
constructing sentences that say what he means instead of
saying very different things.  In the meantime, the claims he
made about the creation of XML (as opposed to those he
may or may not have meant to make) remain false.

   - XML was not invented by IBM and Microsoft.

   - XML was not invented to serve the needs of web services (even if
some of those who created it thought it would be useful for web
services and said so in public, in what proved a fairly successful
attempt to interest other people in supporting XML).

   - Document processing was not a sideline in the creation of the XML
specification but the main focus of those who made the spec.

DS has every right to believe, and to argue in public, that the
suitability of a technology for document representation for digital
humanities work depends on its popularity with web services
implementors; I don't see the connection myself, but then I've never
been much nterested in fashion.

C. M. Sperberg-McQueen
Black Mesa Technologies LLC

        Date: 2019-02-15 14:36:46+00:00
        From: Hugh Cayless 
        Subject: Re: [Humanist] 32.452: the McGann-Renear debate

Re Desmond's post in 32.452:

> It is quite possible for a TEI encoding of holograph manuscripts to be
> so complex that it is practically, although not literally, impossible
> to edit. That is, it is just as likely to be damaged as improved by
> any attempt to edit it. If it is shared by a group of editors this
> level of complexity is reached much sooner. The problem then becomes:
> how do you communicate your understanding of the "howling wind-storm"
> of tags that results to your colleagues so they may share your
> interpretation of the textual phenomena being described?

I'd say two things about this, the first, that this class of manuscripts
became easier to encode with TEI after the addition of the  and
associated elements to the Guidelines. I don't know the relative
chronologies, and it's a bit hard to investigate right now, with the TEI
Vault being unavailable due to a server outage at ADHO. Git tells me that
the genetic encoding elements were added in late 2011, but I'm not sure
precisely when they were first released. Second, that the workflow you
describe sounds complex indeed, and it might not be practical to do it with
TEI. That doesn't invalidate TEI, it just means it might not work well for
your circumstances. That also doesn't rule out that it might be possible to
adjust it to work for your circumstances.

> Here is a moderately difficult example. A succession of hired
> transcribers simply refused to encode this for us. I wonder how
> hierarchies help us here?
> http://charles-harpur.org/corpix/english/harpur/A87-2/00000131a.jpg
> Undoubtedly that's a tough one. I feel quite certain it can be done in
TEI, but of course that's dependent on time, funding, and access to local
expertise. I can't and won't fault you for making different decisions than
I might have.

> Breaking it down into separate layers as we have done is close to the
> method Michael describes, and renders the editorial task perfectly
> manageable.
> http://charles-
> final

I have a couple of concerns about this. Firstly, I'd strongly recommend
using a visual indicator of change that goes beyond color, as readers with
impaired color perception will have trouble with it. Secondly, I worry that
this method might give a false impression of what's going on. If we look at
line 2, comparing "layer 1" and "layer 2", we see something like:

layer 1:
 Keeps munching still the corn of the tall

layer 2:
Will cease not to devour the tall-eared

which is what running a `diff` operation on the two lines might give you.
But this is not at all what has happened in the text, as the image shows
us. First, the whole line was canceled, and then the line of layer 2 was
written above it. I also wonder about the "layer" concept. Do we know that
each layer represents a single editorial stage? That "Even as the mighty
son of Telamon" was canceled at the same time as the original line 2?

Of course, I'm wholly ignorant here and this might be a perfectly
representative model of the poet's editing process. It's exceedingly hard
(dare I say impossible) to generalize across temporal, genre, and
disciplinary differences.


(Apology for my accidental Petrification accepted. I thought it was pretty
funny :-)

(4) So far I agree with Peter and simply wish to refine my argument. But
> there is one point where we disagree. I don't think it is true that
> representations are equivalent if they can be transformed into each other
> without loss of information. Perhaps I misunderstand Peter's point, but
> this seems to overlook information entropy. It must be obvious that some
> representations are more efficient than others, and encode the same
> information in fewer bits. Otherwise it would be "paragraph" not "p",
> and "line-group", not "lg." But there is also the more important point in
> practice, that some representations are more laborious to make than others.
> In many common cases of textual editing, XML is both more laborious than
> the alternatives, and it would not surprise me if it also required many
> more bits for the same information (though I could well be wrong there). In
> other cases, like preparing a well-structured reading text to be rendered
> on a variety of devices in different ways, it is surely the ideal
> technology. There is also the simple matter of elegance, which matters
> because it goes to the interpretability of a representation by a human.

Information entropy is the right way to think about this, but I'd have
thought that the entropy here is zero by definition. If we assume a finite
set of document representations A and transformations t1 and t2, then if
for each instance of A, t1(A[n]) -> B[n] and t2(B[n]) -> A[n], A[n] and
B[n] are equivalent (if we exclude the case where the transformations
contain most or all of the information of the result—no cheating). That's
not to say that form A may not be easier to work with than form B, or
potentially more expressive (so that I might be able to make an A[n] that
couldn't be roundtripped). But this gives us something we can write tests
for, which is good enough for me.

Efficiency is a different question, as is labor. Your mileage will vary
depending on the human and computing resources you have available, and, as
I think we've demonstrated here, arguing about formats from our own
perspectives is as futile as the interminable arguments people have about
programming languages. I can say that Python* sucks, but that doesn't mean
_you_ shouldn't use it.

All the best,

* (I'm actually very fond of Python, lest I start another fight. Don't @

Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php

Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.