Humanist Discussion Group, Vol. 32, No. 424. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: firstname.lastname@example.org  From: Desmond Schmidt
Subject: Re: [Humanist] 32.423: the McGann-Renear debate (104)  From: Hugh Cayless Subject: Re: [Humanist] 32.423: the McGann-Renear debate (101)  From: Peter Robinson Subject: Re: [Humanist] 32.423: the McGann-Renear debate (49)  From: Gabriel Egan Subject: Re: [Humanist] 32.423: the McGann-Renear debate (59) -------------------------------------------------------------------------- Date: 2019-02-05 19:18:48+00:00 From: Desmond Schmidt Subject: Re: [Humanist] 32.423: the McGann-Renear debate Gabriel, >they saw such a >manuscript or printed line of words as really containing >two lines: first the last line of one person's speech and >secondly the first life of someone else's speech. What you are not seeing here to my mind is the fact that a metrical line is a unit. It's not just a number of words with a carriage-return at the end. You can see this phenomenon in many plays but also in ancient Greek plays where the metre, based on long and short syllables, was much more rigid than Shakepseare's intoned rhythm. In Sophocles Ajax lines 591-595 there are four such lines split between Ajax and Tecmessa: Tecmessa: Utter no proud words. Ajax: Speak to those who listen. Tecmessa: Wilt thou not heed? Ajax: Too much thou hast spoken already. Tecmessa: Yes, through my fears, O king. Ajax: Close the doors quickly. Tecmessa: For the gods' love, relent. Ajax: 'Tis a foolish hope, If thou shouldst now propose to school my mood. (I cite an English translation of course but in the original they are really half-lines) So if the -element is really just words terminated by a carriage-return then treat it as prose and leave out the l-element from your encoding. If a speech can't be inside a line in these cases then there are no lines. You can fudge it and say it is a half-line in a speech and create an element for that, but that's not the structure of the source text. Desmond Schmidt eResearch Queensland University of Technology On 2/5/19, Humanist wrote: > Humanist Discussion Group, Vol. 32, No. 423. > Department of Digital Humanities, King's College London > Hosted by King's Digital Lab > www.dhhumanist.org > Submit to: email@example.com [...] > > -------------------------------------------------------------------------- > Date: 2019-02-04 12:55:41+00:00 > From: Gabriel Egan > Subject: Re: [Humanist] 32.417: the McGann-Renear debate > > Dear HUMANISTs > > In response to my querying ("show us an example") of > Desmond Schmidt's claim that in early modern drama it's > possible not only for a dialogue line to be inside a > speech but also for a speech to be inside a line, > Herbert Wender offers an example from Goethe's 'Faust'. > The example is of a manuscript in which the name > "ROSENKNOSPEN" appears anomalously in the middle > of a spoken line. In an article cited by Wender, > the emendation of this line is discussed, and two > options considered. One is to move the name to the > beginning of the line so it forms a speech prefix, > and the other is to move it to another line altogether. > > I'm not seeing how this example illustrates what Schmidt > claimed, which was that it's permissible in early modern > drama for a speech to be inside a line. Rather, it seems > to illustrate that it's possible for a textual witness to > contain error. Far from treating this moment in the play as > an example of a speech being inside a line, the editions of > 'Faust' under discussion in the essay Wender cites treat > this as an error to be corrected. > > There can of course be a speech prefix (marking the > end of one speech and the start of another) occurring > within a manuscript or printed line. Where these occur, > no early modern dramatist thought that the speech was > inside the line. We know they didn't because when they > came to make the actors' 'parts', each of which contained > all the lines to be spoken by a single character, they > would divide such a manuscript or printed line between > two different 'parts' (different physical documents), one > for each of the two characters. That is, they saw such a > manuscript or printed line of words as really containing > two lines: first the last line of one person's speech and > secondly the first life of someone else's speech. They > treated such a shared manuscript or type line just as we > do today, as being really two lines crammed together. > > The context for all this is that I was defending the > claim that texts such as early modern plays really > are an Orderly Hierarchy of Content Objects. The > tree-ness is not merely in the eye of the beholder > as Schmidt claimed > > Regards > > Gabriel Egan -- Dr Desmond Schmidt Mobile: 0481915868 Work: +61-7-31384036 -------------------------------------------------------------------------- Date: 2019-02-05 18:30:37+00:00 From: Hugh Cayless Subject: Re: [Humanist] 32.423: the McGann-Renear debate William: To anyone interested in the problems of multiple hierarchies in text, I would recommend browsing the archived Proceedings of Balisage (https://www.balisage.net/Proceedings/index.html) where you will find a wealth of interesting material (much of it produced by members of this list). Desmond: I did not say that semantic markup should be external to the text. I > said that semantic information can be derived from text without using > any kind of markup. Sure, some of that can be done. You seem to be arguing against the strand of TEI that deals with marking names of (or references to) people, places, etc.. It's perfectly reasonable to defer that to an external process if you want. Though I might ask what you expect the editor of a text to do when they wish to specify that the Alexandria mentioned corresponds to https://pleiades.stoa.org/places/727070 rather than one of the 19 other, similarly named places in Pleiades (not to mention the 4004-odd ones known to GeoNames). There's an important distinction between information extracted and/or inferred by a process and information asserted by an editor. > Semantic markup in XML is too often focused on the > narrow needs of the people who encoded it, or merely records things > that are self-evident and hence not useful for general search and > retrieval. It is a fact that a digital edition of any sort might not do what you want or expect it to do, or that what it does has been done badly or redundantly. Recent history perhaps demonstrates that we do not live in the best of all possible worlds. But bitter sarcasm about the state of the world aside, these are areas that it's fair to critique. I would merely, gently, suggest that "you're still doing markup, stop it" is not really a fair (or indeed actionable) criticism. > I was advocating instead the use of concept-mining tools > like Leximancer that can extract meaning from plain text, HTML and the > like. Also, if modern machine learning techniques can translate from > Chinese to English fluently they can also extract meaning from text. > Machine translation, as good as it has gotten, does not equate to machine understanding. And once again, I'd point out that editors do sometimes want to assert things about the texts they're editing. Or they may have some very basic renditional/functional purpose in mind, like displaying a particular mouseover effect for a given term. > So marking up small amounts of meaning internally or externally to a > text doesn't seem worth the effort to me. I am advocating a much > simpler format for text close to plain text that can be easily mined > for information, that contains only rendering or abstract rendering > information. Surely these needs are met by exporting data to HTML, something most, if not all, TEI projects do. I'm having a little trouble following your line of thinking here. If I have an HTML site that I edit using Markdown, would you refuse to use it because you hate Markdown? Or would you be happy to use the HTML? Deeply structured texts as once provided by XML don't fit > the bill because they mix up the rendering with the semantics and use > too rigid a document structure that invites overlapping hierarchies on > reuse. > > I like your use of the past tense there :-). As various people on this thread have already pointed out, rendering and semantics aren't necessarily all that easy to disentangle. TEI tends to be used to produce editions of source documents in a huge variety of original formats. A lot of TEI is dedicated to mechanisms for recording features of the source. Those mechanisms may of course be rendered themselves when the edition is presented to a reader. Are those mechanisms renditional or semantic? Bit of both, really. As for purely semantic stuff, I think I agree with you that in general it's best not to go too crazy in marking up all of those things, but that it's crucial that there be a mechanism for doing so when an editor wants to make an assertion about what's in the text. As for whether XML is "too rigid", I think that's a value judgement that's quite hard to make, actually. XML *is* strict, and hierarchical, and is therefore amenable to all kinds of correctness checks that make both human workflow and downstream processing easier. It's also flexible enough to do just about anything you want. Trying to argue that one format is better than another when they're capable of representing the same thing is difficult because isomorphism will bite you in the ass. If you're worried that you can't take just any random TEI file and drop it into your NLP pipeline, then, yeah, that's true. You probably have to look at the sources and figure out how to get the information you want out of them. Or talk to their creators. But honestly, that's going to be true with any sort of source data to some extent. All the best, Hugh -------------------------------------------------------------------------- Date: 2019-02-05 15:45:48+00:00 From: Peter Robinson Subject: Re: [Humanist] 32.423: the McGann-Renear debate People keep talking about “overlapping hierarchies” as if that is the only problem to do with dealing with texts and their multiple dimensions. The “overlapping hierarchy” formulation assumes that there is a single stream of text, with characters etc appearing in a single constant order. Accordingly, all you need (all!) do is to overlay different structures on this single stream and everything is fine. For example: one could (as many people keep suggesting) have the stream of characters in one place, and then just use stand-off markup to represent the multiple dimensions — one set for the document pages etc, one set for the semantic text (Acts, Scenes, etc). As I have said several times (and will keep on saying): this is an inadequate model (which is one reason why the many attempts to use stand-off or similar have not got very far). In fact, there is not a single stream of characters. Consider the average book: the stream of semantic text (the ‘text” of Hamlet etc) is broken up by page headers, page numbers, turned-over lines, footnotes, page footers, catchwords, etc etc. Or consider a polyglot bible, or any page of any newspaper, with multiple ’stories' (each with their own hierarchical structure) interrupting and jostling with each other on the page. In cases like this you have multiple texts interweaving on the page. Of course you COULD represent it as a single stream of text (and even then, what do you about pages which contain both right-to-left and left-to-right writing systems?) and perform miracles of ingenuity to extract the multiple distinct “stories” from that stream. These are all examples of what I have described elsewhere (in an earlier posting to this thread) as text NOT being a single stream with multiple overlapping hierarchies. Instead, text is better modelled as a set of leaves, with each leaf potentially present in multiple tree-like hierarchies. “Whan that april with his shoures site” is a leaf of text, present in BOTH the document hierarchy (in a writing space in folio 1r of the Hengwrt manuscript) AND in the communicative act hierarchy (line 1 of the General Prologue of the Canterbury Tales). One may imagine another hierarchy, which gathers together references to time, in which the word “april” is a leaf. And so on. As Desmond Schmidt has commented: implementing this view of texts is rather demanding. I have described this process, of real-time editing texts and their associated multiple trees, as like removing leaves from trees, remaking the trees and reattaching the leaves to the new trees while battling a howling wind- storm. I think I have spent about 25 years trying to do this, the last 5 especially since we realized this is what the problem is, and the solution I have come up with so far is very far from complete. Smarter, younger, more energetic people than me, who begin with this model rather than stumbling on it after years of barking up the wrong “overlapping hierarchies” tree, might do better. I hope. In the meantime, look at www.textualcommunities.org to see how far we have got. Peter -------------------------------------------------------------------------- Date: 2019-02-05 08:10:33+00:00 From: Gabriel Egan Subject: Re: [Humanist] 32.423: the McGann-Renear debate Dear HUMANISTs Desmond Schmidt's example of a part of Shakespeare in which the lines are not wholly contained within the speeches is this from 'Hamlet': Horatio: Friends to this ground. Marcellus: And liegemen to the Dane. Far from being unequivocally built into the structure of what Shakespeare wrote, the above is a useful illustration of where modern editors disagree about the metrical structure. In some modern editions, the above is not considered a single metrical unit. For example, it is now often laid out: FRANCISCO I think I hear them. -- Stand! Who's there? HORATIO Friends to this ground. MARCELLUS And liegemen to the Dane. FRANCISCO Give you good night. Laid out like this, the first two lines form a single metrical unit (an iambic hexameter) and the third and fourth form another metrical unit (an iambic pentameter). What Schmidt has claimed as an example of Shakespeare violating the Orderly Hierarchies principle is in fact an example of a hierarchy that truly is in the eye of the beholder, not baked into the original writing. No theatrical document from the period of Shakespeare's time, in manuscript or print, ever tried to represent a metrical structure that overlaps the line structure. Attempting to show what they assumed was the metrical structure was a practice begun by late eighteenth-century editors of Shakespeare and there are thousands of cases across his canon where modern editors still disagree on the correct assignment of part lines to metrical units. Shakespeare, the other dramatists of his time, their scribes, theatre practitioners, and printers did not consider this alleged metrical structure to be the dominant hierarchy. They considered the division into speeches to be the dominant hierarchy, as they showed in their creation of the 'parts' (also known as 'sides' or 'rolls') by dividing a play script into speeches containing the lines for one character. That hierarchy was real for them, and it was intimately connected to the theatrical practicalities of memorizing the script and rehearsing it in pairs and large groupings of actors. This hierarchy arose from their lived experience as performers and has nothing to do with the "requirement of the markup language", as Schmidt claims. Rather, the rise of markup languages has helped scholars of the stage to think more clearly about what hierarchies really existed in the minds of early modern dramatists as they worked and which hierarchies (such as the one Schmidt invokes) are historically belated. Regards Gabriel Egan _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: firstname.lastname@example.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.