Humanist Discussion Group

< Back to Volume 32

Humanist Archives: Feb. 6, 2019, 6:07 a.m. Humanist 32.424 - the McGann-Renear debate

                  Humanist Discussion Group, Vol. 32, No. 424.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                Submit to: humanist@dhhumanist.org

    From: Desmond Schmidt 
           Subject: Re: [Humanist] 32.423: the McGann-Renear debate

    From: Hugh Cayless 
           Subject: Re: [Humanist] 32.423: the McGann-Renear debate

    From: Peter Robinson 
           Subject: Re: [Humanist] 32.423: the McGann-Renear debate

    From: Gabriel Egan 
           Subject: Re: [Humanist] 32.423: the McGann-Renear debate

        Date: 2019-02-05 19:18:48+00:00
        From: Desmond Schmidt 
        Subject: Re: [Humanist] 32.423: the McGann-Renear debate


>they saw such a
>manuscript or printed line of words as really containing
>two lines: first the last line of one person's speech and
>secondly the first life of someone else's speech.

What you are not seeing here to my mind is the fact that a metrical
line is a unit. It's not just a number of words with a carriage-return
at the end. You can see this phenomenon in many plays but also in
ancient Greek plays where the metre, based on long and short
syllables, was much more rigid than Shakepseare's intoned rhythm. In
Sophocles Ajax lines 591-595 there are four such lines split between
Ajax and Tecmessa:

Tecmessa: Utter no proud words. Ajax: Speak to those who listen.
Tecmessa: Wilt thou not heed? Ajax: Too much thou hast spoken already.
Tecmessa: Yes, through my fears, O king. Ajax: Close the doors quickly.
Tecmessa: For the gods' love, relent. Ajax:  'Tis a foolish hope,
If thou shouldst now propose to school my mood.
(I cite an English translation of course but in the original they are
really half-lines)

So if the -element is really just words terminated by a
carriage-return then treat it as prose and leave out the l-element
from your encoding. If a speech can't be inside a line in these cases
then there are no lines. You can fudge it and say it is a half-line in
a speech and create an element for that, but that's not the structure
of the source text.

Desmond Schmidt
Queensland University of Technology

On 2/5/19, Humanist  wrote:
>                   Humanist Discussion Group, Vol. 32, No. 423.
>             Department of Digital Humanities, King's College London
>                    Hosted by King's Digital Lab
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
> --[2]------------------------------------------------------------------------
>         Date: 2019-02-04 12:55:41+00:00
>         From: Gabriel Egan 
>         Subject: Re: [Humanist] 32.417: the McGann-Renear debate
> In response to my querying ("show us an example") of
> Desmond Schmidt's claim that in early modern drama it's
> possible not only for a dialogue line to be inside a
> speech but also for a speech to be inside a line,
> Herbert Wender offers an example from Goethe's 'Faust'.
> The example is of a manuscript in which the name
> "ROSENKNOSPEN" appears anomalously in the middle
> of a spoken line. In an article cited by Wender,
> the emendation of this line is discussed, and two
> options considered. One is to move the name to the
> beginning of the line so it forms a speech prefix,
> and the other is to move it to another line altogether.
> I'm not seeing how this example illustrates what Schmidt
> claimed, which was that it's permissible in early modern
> drama for a speech to be inside a line. Rather, it seems
> to illustrate that it's possible for a textual witness to
> contain error. Far from treating this moment in the play as
> an example of a speech being inside a line, the editions of
> 'Faust' under discussion in the essay Wender cites treat
> this as an error to be corrected.
> There can of course be a speech prefix (marking the
> end of one speech and the start of another) occurring
> within a manuscript or printed line. Where these occur,
> no early modern dramatist thought that the speech was
> inside the line. We know they didn't because when they
> came to make the actors' 'parts', each of which contained
> all the lines to be spoken by a single character, they
> would divide such a manuscript or printed line between
> two different 'parts' (different physical documents), one
> for each of the two characters. That is, they saw such a
> manuscript or printed line of words as really containing
> two lines: first the last line of one person's speech and
> secondly the first life of someone else's speech. They
> treated such a shared manuscript or type line just as we
> do today, as being really two lines crammed together.
> The context for all this is that I was defending the
> claim that texts such as early modern plays really
> are an Orderly Hierarchy of Content Objects. The
> tree-ness is not merely in the eye of the beholder
> as Schmidt claimed
> Regards
> Gabriel Egan

Dr Desmond Schmidt
Mobile: 0481915868 Work: +61-7-31384036

        Date: 2019-02-05 18:30:37+00:00
        From: Hugh Cayless 
        Subject: Re: [Humanist] 32.423: the McGann-Renear debate


To anyone interested in the problems of multiple hierarchies in text, I
would recommend browsing the archived Proceedings of Balisage 
(https://www.balisage.net/Proceedings/index.html) where you will find a
wealth of interesting material (much of it produced by members of this


I did not say that semantic markup should be external to the text. I
> said that semantic information can be derived from text without using
> any kind of markup.

Sure, some of that can be done. You seem to be arguing against the strand
of TEI that deals with marking names of (or references to) people, places,
etc.. It's perfectly reasonable to defer that to an external process if you
want. Though I might ask what you expect the editor of a text to do when
they wish to specify that the Alexandria mentioned corresponds to
https://pleiades.stoa.org/places/727070 rather than one of the 19 other,
similarly named places in Pleiades (not to mention the 4004-odd ones known
to GeoNames). There's an important distinction between information
extracted and/or inferred by a process and information asserted by an

> Semantic markup in XML is too often focused on the
> narrow needs of the people who encoded it, or merely records things
> that are self-evident and hence not useful for general search and
> retrieval.

It is a fact that a digital edition of any sort might not do what you want
or expect it to do, or that what it does has been done badly or
redundantly. Recent history perhaps demonstrates that we do not live in the
best of all possible worlds. But bitter sarcasm about the state of the
world aside, these are areas that it's fair to critique. I would merely,
gently, suggest that "you're still doing markup, stop it" is not really a
fair (or indeed actionable) criticism.

> I was advocating instead the use of concept-mining tools
> like Leximancer that can extract meaning from plain text, HTML and the
> like. Also, if modern machine learning techniques can translate from
> Chinese to English fluently they can also extract meaning from text.

Machine translation, as good as it has gotten, does not equate to machine
understanding. And once again, I'd point out that editors do sometimes want
to assert things about the texts they're editing. Or they may have some
very basic renditional/functional purpose in mind, like displaying a
particular mouseover effect for a given term.

> So marking up small amounts of meaning internally or externally to a
> text doesn't seem worth the effort to me. I am advocating a much
> simpler format for text close to plain text that can be easily mined
> for information, that contains only rendering or abstract rendering
> information.

Surely these needs are met by exporting data to HTML, something most, if
not all, TEI projects do. I'm having a little trouble following your line
of thinking here. If I have an HTML site that I edit using Markdown, would
you refuse to use it because you hate Markdown? Or would you be happy to
use the HTML?

Deeply structured texts as once provided by XML don't fit
> the bill because they mix up the rendering with the semantics and use
> too rigid a document structure that invites overlapping hierarchies on
> reuse.
> I like your use of the past tense there :-). As various people on this
thread have already pointed out, rendering and semantics aren't necessarily
all that easy to disentangle. TEI tends to be used to produce editions of
source documents in a huge variety of original formats. A lot of TEI is
dedicated to mechanisms for recording features of the source. Those
mechanisms may of course be rendered themselves when the edition is
presented to a reader. Are those mechanisms renditional or semantic? Bit of
both, really. As for purely semantic stuff, I think I agree with you that
in general it's best not to go too crazy in marking up all of those things,
but that it's crucial that there be a mechanism for doing so when an editor
wants to make an assertion about what's in the text.

As for whether XML is "too rigid", I think that's a value judgement that's
quite hard to make, actually. XML *is* strict, and hierarchical, and is
therefore amenable to all kinds of correctness checks that make both human
workflow and downstream processing easier. It's also flexible enough to do
just about anything you want.  Trying to argue that one format is better
than another when they're capable of representing the same thing is
difficult because isomorphism will bite you in the ass. If you're worried
that you can't take just any random TEI file and drop it into your NLP
pipeline, then, yeah, that's true. You probably have to look at the sources
and figure out how to get the information you want out of them. Or talk to
their creators. But honestly, that's going to be true with any sort of
source data to some extent.

All the best,

        Date: 2019-02-05 15:45:48+00:00
        From: Peter Robinson 
        Subject: Re: [Humanist] 32.423: the McGann-Renear debate

People keep talking about “overlapping hierarchies” as if that is the only
problem to do with dealing with texts and their multiple dimensions. The
“overlapping hierarchy” formulation assumes that there is a single stream of
text, with characters etc appearing in a single constant order. Accordingly, all
you need (all!) do is to overlay different structures on this single stream and
everything is fine. For example: one could (as many people keep suggesting) have
the stream of characters in one place, and then just use stand-off markup to
represent the multiple dimensions — one set for the document pages etc, one set
for the semantic text (Acts, Scenes, etc).

As I have said several times (and will keep on saying): this is an inadequate
model (which is one reason why the many attempts to use stand-off or similar
have not got very far). In fact, there is not a single stream of characters.
Consider the average book: the stream of semantic text (the ‘text” of Hamlet
etc) is broken up by page headers, page numbers, turned-over lines, footnotes,
page footers, catchwords, etc etc. Or consider a polyglot bible, or any page of
any newspaper, with multiple ’stories' (each with their own hierarchical
structure) interrupting and jostling with each other on the page. In cases like
this you have multiple texts interweaving on the page. Of course you COULD
represent it as a single stream of text (and even then, what do you about pages
which contain both right-to-left and left-to-right writing systems?) and perform
miracles of ingenuity to extract the multiple distinct “stories” from that

These are all examples of what I have described elsewhere (in an earlier posting
to this thread) as text NOT being a single stream with multiple overlapping
hierarchies. Instead, text is better modelled as a set of leaves, with each leaf
potentially present in multiple tree-like hierarchies. “Whan that april with his
shoures site” is a leaf of text, present in BOTH the document hierarchy (in a
writing space in folio 1r of the Hengwrt manuscript) AND in the communicative
act hierarchy (line 1 of the General Prologue of the Canterbury Tales). One may
imagine another hierarchy, which gathers together references to time, in which
the word “april” is a leaf. And so on.

As Desmond Schmidt has commented: implementing this view of texts is rather
demanding. I have described this process, of real-time editing texts and their
associated multiple trees, as like removing leaves from trees, remaking the
trees and reattaching the leaves to the new trees while battling a howling wind-
storm. I think I have spent about 25 years trying to do this, the last 5
especially since we realized this is what the problem is, and the solution I
have come up with so far is very far from complete. Smarter, younger, more
energetic people than me, who begin with this model rather than stumbling on it
after years of barking up the wrong “overlapping hierarchies” tree, might do
better. I hope. In the meantime, look at www.textualcommunities.org to see how
far we have got.


        Date: 2019-02-05 08:10:33+00:00
        From: Gabriel Egan 
        Subject: Re: [Humanist] 32.423: the McGann-Renear debate


Desmond Schmidt's example of a part of Shakespeare in which
the lines are not wholly contained within the speeches
is this from 'Hamlet':

Horatio: Friends to this ground.
Marcellus: And liegemen to the Dane.

Far from being unequivocally built into the structure of what
Shakespeare wrote, the above is a useful illustration of where
modern editors disagree about the metrical structure. In some
modern editions, the above is not considered a single metrical
unit. For example, it is now often laid out:

FRANCISCO  I think I hear them. -- Stand! Who's there?
HORATIO                           Friends to this ground.
MARCELLUS  And liegemen to the Dane.
FRANCISCO                           Give you good night.

Laid out like this, the first two lines form a single metrical
unit (an iambic hexameter) and the third and fourth form
another metrical unit (an iambic pentameter).

What Schmidt has claimed as an example of Shakespeare violating
the Orderly Hierarchies principle is in fact an example of a
hierarchy that truly is in the eye of the beholder, not baked
into the original writing. No theatrical document from the
period of Shakespeare's time, in manuscript or print, ever
tried to represent a metrical structure that overlaps the
line structure. Attempting to show what they assumed was the
metrical structure was a practice begun by late eighteenth-century
editors of Shakespeare and there are thousands of cases across
his canon where modern editors still disagree on the correct
assignment of part lines to metrical units.

Shakespeare, the other dramatists of his time, their scribes,
theatre practitioners, and printers did not consider this alleged
metrical structure to be the dominant hierarchy. They considered
the division into speeches to be the dominant hierarchy, as they
showed in their creation of the 'parts' (also known as 'sides'
or 'rolls') by dividing a play script into speeches containing
the lines for one character. That hierarchy was real for them,
and it was intimately connected to the theatrical practicalities of
memorizing the script and rehearsing it in pairs and large groupings
of actors. This hierarchy arose from their lived experience as
performers and has nothing to do with the "requirement of the markup
language", as Schmidt claims. Rather, the rise of markup languages
has helped scholars of the stage to think more clearly about what
hierarchies really existed in the minds of early modern dramatists
as they worked and which hierarchies (such as the one Schmidt invokes)
are historically belated.


Gabriel Egan

Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.