Humanist Discussion Group, Vol. 16, No. 179.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>
Date: Sat, 24 Aug 2002 08:17:41 -0700
From: Patrick Durusau <pdurusau@emory.edu>
Subject: Just-In-Time-Trees (JITTs)
Willard,
I thought Humanist readers might be interested in the latest line of attack
Matthew O'Donnell and I have taken on the problem of overlapping
hierarchies in texts. The presentation that was made at the Extreme Markup
conference (Montreal, 2002) is now available on the SBL website,
http://www.sbl-site2.org/Overlap/ (follow the link to Just-In-Time-Trees
(JITTs).
We propose that the declaration of the document root and the markup to be
recognized should be moved from the syntax layer and made a part of the
processing of a text. That change in the model for handling markup removes
the various problems with overlapping markup that have been the subject of
numerous proposals but few widespread implementations since the rise of
SGML. Our latest proposal differs from all prior ones in that it allows the
use of standard XML software for the processing of texts, while allowing
extensive experimentation with markup languages for the encoding of texts.
Our argument for markup recognition is grounded in the text of ISO 8879
(concur) and extends that concept to XML by the use of filters to declare
the document root and markup to be recognized.
The only resource available at this particular moment is the presentation
from the Extreme Markup conference but a more formal paper should appear at
that location by late September along with sample code for experimenting
with the technique.
The oddest question that has been voiced in response to our proposal is how
serious a problem is overlap for humanities texts? I consider it odd since
any number of humanities projects, including the TEI Guidelines, make
repeated references to the need to record overlapping hierarchies in texts.
There are also the questions raised by authors such as Jerome
McGann, http://jefferson.village.virginia.edu/~jjm2f/jj2000aweb.html,
about the use of markup for representation of texts. Still, the importance
of the problem is one more of personal experience for me than a systematic
analysis of texts of interest to humanists. As part of our research, I
would like to develop (or learn about) more convincing arguments for
overlapping hierarchies in texts.
Suggestions of prior studies, measures of overlap and its importance and
similar resources would be greatly appreciated. One possible candidate for
constructing a measure of overlap are the minimum tree-to-tree editing
distance algorithms but I am sure there are others.
Suggestions?
Thanks!
Patrick
-- Patrick Durusau Director of Research and Development Society of Biblical Literature pdurusau@emory.edu
This archive was generated by hypermail 2b30 : Sat Aug 24 2002 - 03:42:42 EDT