14.0482 XML, WWW, editions

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 11/08/00
Next message: by way of Willard McCarty: "14.0475 items from Scout Report"
Previous message: by way of Willard McCarty: "14.0480 PhD studentships at Sunderland (U.K.)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
               Humanist Discussion Group, Vol. 14, No. 482.
       Centre for Computing in the Humanities, King's College London
               <http://www.princeton.edu/~mccarty/humanist/>
              <http://www.kcl.ac.uk/humanities/cch/humanist/>

   [1]   From:    John Bradley <john.bradley@kcl.ac.uk>               (56)
         Subject: Re: 14.0469 XML & WWW; XML references; a broader
                 question

   [2]   From:    "David Halsted" <halstedd@tcimet.net>               (29)
         Subject: Re: 14.0469 XML & WWW

   [3]   From:    "Fotis Jannidis" <fotis.jannidis@lrz.uni-           (22)
                 muenchen.de>
         Subject: Re: 14.0469 XML & WWW

   [4]   From:    lachance@chass.utoronto.ca (Francois Lachance)      (42)
         Subject: imprint, edition, publication


--[1]------------------------------------------------------------------
         Date: Wed, 08 Nov 2000 09:39:03 +0000
         From: John Bradley <john.bradley@kcl.ac.uk>
         Subject: Re: 14.0469 XML & WWW; XML references; a broader question


   >btw, I don't think that xml aware clients will be the solution for this
   >problem, because of the size of the editions.

Wendell: I also share the view that Fotis is expressing here,
although I must say that I have had so little time to do serious work
in this area that I'm not sure my opinions should count for TOO much
these days!

Nonetheless, it seems to me that the WWW (and also much of the
development work at W3C) is predicated on the unspoken assumption
that the amount of data to be exchanged between server and client is
relatively small.  This model may be fine for the kind of
transational-oriented B2B applications that seem to be driving
developments these days.  However, it appears to be a serious problem
when looking at the scholarly use of texts.  I recall the first time
this observation struck me -- several years ago when I went to the
text archive site at University of Virginia (or was it Michigan?) and
fetched their relatively-lightly marked up SGML-TEI documents using
(as I recall) Panorama.  By the nature of the web access, and the
"document-oriented" nature of SGML, (and, to be fair, perhaps the way
that Panorama worked then) I had to fetch the entire document before
seeing it.  It took a very long time -- as I recall about 30 minutes
(this was when I was still at U of Toronto) -- before I saw anything
of the document at all.  Suppose that instead of looking (merely
trying to read!) a novel by Dickens I had been trying to do some
analysis on all of Dickens' works.  The slowness would have been only
one of the problems.  At the time it seemed to me that this approach
-- shipping the entire document in a single gulp over the Internet
before anything could be done with it -- was not going to gain wide
acceptance for material of this kind.  The HTML representation of the
same material was easier to handle because it had been split up into
chunks -- but it seems to me that for scholarly use of text at least
this chunking (except for straightforward reading on screen or
printing out) was unfortunate at least, and, of course, the only
markup one had to work with was HTML.

It might be possible to divide the document into chunks for XML
processing as well, although (it seems to me at least) by the nature
of the way that SGML and XML work, the chunked version becomes at
least in some sense different from the unchunked one when split in
separate pieces.  I know, of course, that XPointer links can be made
between separate documents, and someday widely available software
will be able to deal with them -- but the chunking of materials into
separate XML documents, not just the linking between them, is, I
think, undesirable.  This becomes more and more of an issue when the
amount of text in the document becomes larger, and the links between
different parts (the thereby implied kind of processing one might
want to do on those links) more intricate -- think about analysing
text chunks that cross the boundaries between chunks provided by the
electronic publisher, for example.

You may recall that I raised this problem at my presentation at
Virginia, and proposed there an architectural model that is XML based
but is not based on the HTTP-WWW document chunking model. Whether it
is any good or not, of course, would require me to develop it
further!

All the best.                                 ... john b
----------------------
John Bradley
john.bradley@kcl.ac.uk




--[2]------------------------------------------------------------------
         Date: Wed, 08 Nov 2000 09:39:56 +0000
         From: "David Halsted" <halstedd@tcimet.net>
         Subject: Re: 14.0469 XML & WWW


Edition size could be addressed in a number of ways.  It's true that it's
probably not useful to think of individual desktops chunking through a large
number of very large XMLs retrieved on the fly from remote machines, but it
might be possible to think of, say, individual servers indexing a group of
XML documents that are actually "stored" on other servers and making the
index available for a set of users with shared interests.  In addition,
sites with lots of XML behind them could make useful drill-downs available
to users as well, and expose the results in XML.  So you could have a very
nice set of mixed modes; sites with lots of XML could use server-side tools
(including databases) to optimize searching, but could also expose the XML
data stores, enabling anybody with enough machine to run their own searches
against the data.  Users finding the site-provided tools inadequate could
beef their RAM and manipulate the data themselves to meet their own needs;
in fact, those users could expose the results of their research as XML and
enable the original store to provide a link to their results.  Depending on
the field, the results might become part of the underlying data store or
simply build a searchable interpretive layer on top of the raw data.

Eventually, we get to move beyond thinking about servers and clients, to
thinking about severs talking to servers and people sort of "peeking in" to
the data, asking the servers to provide the information they want from a
connected series of other servers with data exposed in XML, that is,
publicly queryable.  It'd be nice to see Humanities computing develop some
things here; texts and published research can be public in a way that
corporate data can't, so perhaps the true potential in distributed XML
models can be realized more quickly with online Humanities computing.

Dave Halsted

***
David G. Halsted, PhD
Consultant (XML/RDBMS/Web apps)
halstedd@tcimet.net

--[3]------------------------------------------------------------------
         Date: Wed, 08 Nov 2000 09:40:30 +0000
         From: "Fotis Jannidis" <fotis.jannidis@lrz.uni-muenchen.de>
         Subject: Re: 14.0469 XML & WWW

From:    Wendell Piez <wapiez@mulberrytech.com>              (18)

  > How large do you expect these editions to be?

What we have now are electronic editions with some megabyte. To
give an example: the rather small edition "Der junge Goethe in seiner
Zeit" (young goethe in his time) has about 35 MB. But this will grow
quickly and I expect editions on one server to have some gigabytes
in 10-20 years. I am not talking about commercial editions like the
ones offered by Chadwyck-Healey, because they can solve these
interoperability problems within their company, but about editions
put on the net by the scholars who created them.

  > Why would server-side
  > processing be better for large editions?

At the moment: because the browsers can't offer any kind of
processing which would be useful to solve this problem. In the
future: Probably there will be a division of labor between xml
browsers and server. It would make our work easier if we agree
early upon a common solution.


  > Or possibly I mistake you. If you mean to say XML-aware clients will not be
  > the *entire* solution to the problem, I agree.

Yes, that is exactly what I wanted to say.
But your question sounds to me like you have some ideas how to
handle these problems. I am very interested in any ideas.

Fotis Jannidis

--[4]------------------------------------------------------------------
         Date: Wed, 08 Nov 2000 09:41:42 +0000
         From: lachance@chass.utoronto.ca (Francois Lachance)
         Subject: imprint, edition, publication

Patrick,

How would your argument about the openendedness of electronic editions
work if the volatility of texts were a consequence of social practices
and less so of technologically determined paradigms? (The question is
of course moot if you consider "paradigms" to be expressions of social
practice.)

I am just a little wary of a quasi-ahistorical assertion of a single
monolithic "print-medium paradigm of publication". And so I like to
generalize in a most grandiose fashion:

All texts are volatile. Electronic distribution may actually help
preserve the variants that contribute to the creation of an edition.

The vapours are captured in many media. Paper plus voice plus screen
contribute to preservation of variation.

A consideration of multimedia and audiovisual components of textual
expression certainly challenges the often dichotomous crypto-mcluhanesque
debate over print versus electronic.

If an edition is a set of readings of records of performances, by its very
matricial structure it is not only a gathering of what was witnessed but
also an index of what might have been. Whatever the medium in which it is
expressed, an edition contains a certain amount of conjecture. And it is
the opening of an edition's working hypotheses to testing that contribute
to its incompleteness (in the sense of possible world semantics) --- not
the medium in which the expression of those working hypotheses are fixed.

I just wonder how the link between systems of distribution and authorial
control is any different for the written word, the spoken word, the film,
the song, the symphony, the painting either hung in a gallery or
reproduced as a digital image. We can ask ourselves what cultural
conditions result in gallery spaces where viewers can adjust the lighting
or concert spaces where the sound is not uniform (for example Morton
Feldman's _Rothko Chapel_) for every point in the space.

There is a wholesale attitude towards temporality and the possibility of
intersubjective experience that accompanies people's use of media and
their discourse about the use of media. Some of us begin from a
non-Parmenidean position: change is the very basis upon which we can build
shared experiences. Media can help in two ways: as facilitators of change
and preservation; as facilitators of sharing (and hoarding). I'm not
quite sure if a necessary (as opposed to fortuitous) connection exists
between the two types of facilitation. Any thoughts?

-- 
Francois Lachance, Scholar-at-large
	http://www.chass.utoronto.ca/~lachance
Member of the Evelyn Letters Project
	http://www.chass.utoronto.ca/~dchamber/evelyn/evtoc.htm
Next message: by way of Willard McCarty: "14.0475 items from Scout Report"
Previous message: by way of Willard McCarty: "14.0480 PhD studentships at Sunderland (U.K.)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b30 : 11/08/00 EST