Humanist Discussion Group, Vol. 14, No. 277.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>
[1] From: Wendell Piez <wapiez@mulberrytech.com> (125)
Subject: Re: 14.0272 methodological primitives
[2] From: "Ian Lancashire" <ian@chass.utoronto.ca> (89)
Subject: Re: 14.0272 methodological primitives
[3] From: "Osher Doctorow" <osher@ix.netcom.com> (42)
Subject: Re: Methodological primitives
--[1]------------------------------------------------------------------
Date: Wed, 27 Sep 2000 09:34:28 +0100
From: Wendell Piez <wapiez@mulberrytech.com>
Subject: Re: 14.0272 methodological primitives
Hi Willard and HUMANIST:
At 07:27 AM 9/26/00 +0100, John Bradley wrote:
....
>In Object Oriented (OO) design, there is a another way to design
>processing which is these days very much in fashion. One perhaps key
>difference: Object Oriented design blurs the distinction Willard made
>in his first posting on this subject between data and process, and I
>think this makes a dramatic difference in the way one looks at the
>whole issue. It seems particularly well suited for modelling
>processes that involve the production of "interactive" and
>"GUI-based" systems. I don't know of anyone, however, who has managed
>to take OO design and apply it in quite the way implied here -- as a
>basis for the construction of primitives that non-programmers could
>adapt for specific tasks.
This is fascinating stuff. John's point about the underlying assumption in
OO design -- to merge the conception, in modeling, of data and process, is
very well taken. It's especially interesting in this context because as
these systems evolve, naturally, the old ideas and approaches come up time
and again. In the context of OO (especially, say, Java, with its promise of
portability and the long-term robustness that comes with
platform-independence), we see the pendulum swing back again with the
emergence of markup-based (specifically XML-based) systems.
A key reason OO approaches work well for interactive GUIs and other
process-intensive work is, in fact, that even while they can support
strongly encapsulated architectures (more easily modified and maintained)
OO programs can take shortcuts to achieve functionality, at the price of
locking in their data to a particular data model (and hence, usually, a
particular format). But who wants to be storing their conference papers as
Java objects? Of course, the next step is to abstract and formalize a
portable data model outside the implementation, pulling data back away from
process (at least to whatever extent is possible). By providing a standard
syntax supporting off-the-shelf tools, XML eases this work greatly.
In the business, we've used the analogy, "if Java is a way to build your
toaster, then XML is sliced bread." This tries to identify a key advantage
of a standards-based markup syntax: that, in theory at least (and
increasingly in practice), it should now be possible to use OO languages to
work in the way they want -- with sophisticated data models (not merely
streams of characters) -- and yet not lock our data into the specific
processing environment we happen to be using at the moment.
>Any tool meant to support activities as diverse as those that turn up
>in humanities text-based computing cannot possibly be trivial to
>learn or use. The level of professionalism and commitment required
>for a full use of TuStep is, I think, roughly comparable to that
>required to learn to work with, say, Perl, or (I think) Smalltalk and
>text-oriented Smalltalk objects.
I think that's fair, since any toolset whose native data set is a file
containing a stream of characters, must work on that basis, inferring more
complex data structures where it can (by parsing), but not assuming in the
general case that those particular data structures are there in that form.
For one thing, in Humanities Computing as it stands, it's fair to assume
they're not.
In order to build a more "intuitive" system (say, a GUI-driven system
allowing on-the-fly manipulation of texts), a more sophisticated data model
needs to be assumed that can support more complex operations in a
generalized way. To go about "sorting entries in Swedish lexicon order" or
"sorting entries in Icelandic name order": the system has to know both what
these orders are, and what an "entry" is. XML, by providing for a
particular kind of tree-structure, is beginning to provide at least an
infrastructure within which such knowledge is embedded, so we can now begin
to use standard syntaxes such as XPath (co-designed by a Computing
Humanist, Steve DeRose) for some of this. (XPath can't sort, but it can do
some other fancy stuff such as filter by content, so that
'//line[contains(., "To be")]' will return all <line> elements in a
document that contain the string "To be".) Consequently, we are beginning
to see some of these capabilities emerging as XML tools.
For example, Sun Microsystems has an "EAI" product (the TLA stands for
"Enterprise Application Integration") called Forte Fusion (that 'e' has an
accent mark that I don't trust your mailer to render) that allows a user to
set up a data process flow chain in which an XML data set can be passed
through a series of processors, including, prominently, XSLT
transformations that could be doing filtering, sorting, analytical work.
The idea is that when you click on the form to submit your order for the
new American Civil War battle game, your order can be parsed, and the
Authorization, Shipping and Billing departments at NorthernAggression.com
can all get the appropriate pieces of your order (some of which might
already be in the system since you're a regular) in a timely way, following
whatever internal logic is required (e.g. don't send the game out if your
credit card bounces again). The whole thing works with a GUI: little icons
represent your filtering and processing engines, with, as it were, a pipe
carrying the data between them. The different engines can be disparate,
running on different systems and platforms, a Unix server here running a
batch program in Perl, an XSLT transform on a client over there, and so forth.
But to build something like this, you have to have a fairly stable data
model. (In this case, the system is going to do special things with your
name, address, credit card number etc.) At this stage, it is too early to
say when such a data model will be possible or feasible for the kind of
analysis we want to do in Humanities Computing -- especially considering we
commonly work at the level of the "word" (whatever that is), not just
element types, and want access to orthographical variants, morphologies,
synonyms, etc. etc., intelligence about all of which has to be stored
somewhere in some sufficiently tractable (and long-lived) form. Not to
mention the problem of sense-disambiguation (I love Prof. Ott's bit about
the "content provider" becoming a "satisfied donor"). Our work with
higher-level linguistic and literary structures has barely started.
Also, to be an iconoclast about it, I am not sure it is our best course to
move forward pell-mell in this direction, without being extremely critical
of the task itself. Every lens comes with its blindness, and as we design
these capabilities into systems, by deciding what we want to look at, we
will also be deciding what we don't care to see. I am very much in favor of
experimental work to design and deploy whatever higher-level structures we
can discern, trace, render malleable with these powerful tools. But I also
believe that great works of literature will continue to evade whatever
structures we impose on them, just as they always have, it being the
primary work of every poet to reinvent the art of poetry from scratch.
And not only for ourselves should we be wary, but for the role we have to
play in the larger world's understanding of its own rhetorics and how they
work. It does little good to say when the Emperor has no clothes, if you
haven't been taking care of your own wardrobe.
So, while I'm not going to be quitting work myself on methodological
primitives, I'm not confident that you're going to see them anytime soon in
a form that a naive user, without knowledge of sordid details of text
encoding, could simply sit down, tinker with and have instantly useful and
trustworthy results. "Epiphany In a Box"? Which is a good thing. After all,
isn't it our role to show the naive user what's *really* going on?
Best regards,
Wendell
======================================================================
Wendell Piez mailto:wapiez@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--[2]------------------------------------------------------------------
Date: Wed, 27 Sep 2000 09:35:10 +0100
From: "Ian Lancashire" <ian@chass.utoronto.ca>
Subject: Re: 14.0272 methodological primitives
The best set of text-based utilities can be found in UNIX, the next best in
Dan Melamed's perl tools at http://www.cis.upenn.edu/~melamed/ . The 1980s
Hum was a gem too. It still is. Susan Hockey, as usual, was prescient: she
saw the need for someone in the humanities to learn basic programming and to
assemble groups of these "primitives." Her fine book on Snobol programming
enables the altruistic humanist still. Earlier, Nancy Ide published a book
on Pascal for the humanities. (This isn't meant to be an exhaustive list
....)
Maybe one of the unforeseen effects of relying on professional programmers
to create big pieces of software like TACT and Wordcruncher is to encourage
scholars in the humanities to believe that they can get along without being
able to write small programs or adapt ones created by other people. (This
too is a debate I have overheard intermittently over several decades.)
Ott's comments on the impediments to releasing primitives that would satisfy
all and sundry come from an expert programmer. The world of cybertext is too
complex now. We will also never all agree on how, for whatever purpose, to
symbolize the more fundamental primitives embedded in any programming
language.
Ian Lancashire
Toronto
----- Original Message -----
From: Humanist Discussion Group
<willard.mccarty@kcl.ac.uk>) <willard@lists.village.virginia.edu>
To: Humanist Discussion Group <humanist@lists.Princeton.EDU>
Sent: Tuesday, September 26, 2000 2:27 AM
>
> Humanist Discussion Group, Vol. 14, No. 272.
> Centre for Computing in the Humanities, King's College London
> <http://www.princeton.edu/~mccarty/humanist/>
> <http://www.kcl.ac.uk/humanities/cch/humanist/>
>
>
>
> Date: Tue, 26 Sep 2000 07:16:14 +0100
> From: John Bradley <john.bradley@kcl.ac.uk>
> Subject: Re: 14.0258 methodological primitives?
>
> Willard: I would certainly support anyone who took the view that
> Wilhelm Ott's TuStep system provides a very solid set of "primitives"
> for the scholarly manipulation of text. I have spent many hours of
> time examining their design (although I confess that my actual
> experience of using them has been very limited indeed) and can well
> appreciate that they could be combined to deal with a very large
> number of text manipulation needs. Anyone seriously interested in
> thinking about what a design needs to include in detail would benefit
> much from examining TuStep in this way.
>
> The approach towards tools for generalised processing shown in TuStep
> is, from the computing perspective, a very old one -- but at the same
> time it is a model that is still often applied when a computing
> professional needs to do a complex computing task him/herself. The
> UNIX environment with its basic "filtering" tools, a sorting
> program, some programmable text-oriented editors, and things like
> Perl, are based in very similar approaches.
>
> In Object Oriented (OO) design, there is a another way to design
> processing which is these days very much in fashion. One perhaps key
> difference: Object Oriented design blurs the distinction Willard made
> in his first posting on this subject between data and process, and I
> think this makes a dramatic difference in the way one looks at the
> whole issue. It seems particularly well suited for modelling
> processes that involve the production of "interactive" and
> "GUI-based" systems. I don't know of anyone, however, who has managed
> to take OO design and apply it in quite the way implied here -- as a
> basis for the construction of primitives that non-programmers could
> adapt for specific tasks. However, the original OO language --
> Smalltalk -- >was< designed to allow non-programmer users (children)
> to create significant applications of their own, and it retains, I
> think, some of this flavour of supporting the combination of
> experiment, development and processing in a single environment.
> Furthermore, I know of people who have a set of powerful objects (in
> Smalltalk, it turns out) they use and enhance over and over again to
> accomplish very sophisticated text manipulation tasks.
>
> Any tool meant to support activities as diverse as those that turn up
> in humanities text-based computing cannot possibly be trivial to
> learn or use. The level of professionalism and commitment required
> for a full use of TuStep is, I think, roughly comparable to that
> required to learn to work with, say, Perl, or (I think) Smalltalk and
> text-oriented Smalltalk objects.
>
> Best wishes. ... john b
> ----------------------
> John Bradley
> john.bradley@kcl.ac.uk
>
>
>
>
>
--[3]------------------------------------------------------------------
Date: Wed, 27 Sep 2000 09:36:10 +0100
From: "Osher Doctorow" <osher@ix.netcom.com>
Subject: Re: Methodological primitives
My previous contribution on this topic may have been a bit obscure, so I
will try a slightly different approach. My view is that whatever you are
talking about, it is useless if you cannot make a Shakespearean play about
it. On methodological primitives, I will for concreteness consider the
special case of political history, which is far more concrete than it looks
in a certain sense. I maintain that political history has 3 methodological
primitives (mp's or mps for short), namely, anger, blame, and
naivete/ignorance (naivete is I think the nice way of referring to
ignorance). I propose a 3 actor, 6 act play to illustrate this (3 times 2
is 6, which is the number of permutations of 3 actors). For our
actors/actresses, we will select any 3 characters from Shakespeare, and put
labels on them, namely, A for anger, B for blame, and N for
naivete/ignorance. To show the direction of influence or causation, we
will have A point to B if A influences B, and so on, and we limit the play
to 3-person or 3-party influence cases. Let me translate this play into an
easier summary. Political history is composed of angry public A who elect
or cause to have power political blamers B who blame ignorant or naive
people N. It is also composed of naive/ignorant people N who elect or cause
to have power politicians B who blame angry people A. It is also composed
of angry politicians A who enable blamer B to seize power and thus start a
war against ignorant/naive people N. Of course, blamers B can also elect
naive/ignorant person N who starts a preventative war against angry people
A. Alternatively, blamers B may decide to elect or give power to an angry
psychopath or sociopath A who starts a preventative war against
naive/ignorant people N. I think the trend here is becoming obvious. This
seems to cover political history from prehistoric through modern times, with
various permutations.
Notice carefully that I have not yet introduced computers, even though this
discussion group concerns humanist computation. That is because it has not
yet reached the stage where it iinvolves too much work for people to keep
track of or accomplish rapidly. I am trying to be parsimonious here and
save time and money. Why spend money when you don't need to (remind me to
include that among future methodological primitives)? I am quite sure,
however, that at some stage computers will be called upon for their
assistance. As we turn to more and more complex things than political
history, I feel certain that computers will find themselves of use. If
nothing else, they can keep track of the possibilities that we have
eliminated. For example, Ovid's Metamorphoses cannot refer to political
history since otherwise it would reduce to the above statements. There must
be millions of literary works which are excluded by similar grounds, and
computers are definitely required to keep track of those.
Yours To Be Continued,
Osher Doctorow
This archive was generated by hypermail 2b30 : 09/27/00 EDT