14.0486 XML, WWW, editions

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 11/10/00

  • Next message: by way of Willard McCarty: "14.0487 free book: Accessing our humanities collections"

                   Humanist Discussion Group, Vol. 14, No. 486.
           Centre for Computing in the Humanities, King's College London
       [1]   From:    Wendell Piez <wapiez@mulberrytech.com>              (64)
             Subject: Re: 14.0482 XML, WWW, editions
       [2]   From:    "David Halsted" <halstedd@tcimet.net>               (24)
             Subject: Re: 14.0482 XML, WWW, editions
             Date: Fri, 10 Nov 2000 09:49:31 +0000
             From: Wendell Piez <wapiez@mulberrytech.com>
             Subject: Re: 14.0482 XML, WWW, editions
    Replying to letters from John Bradley, David Halsted, Fotis Jannidis,
    Francois Lachance....
    I'd be surprised if Fotis and I are not in substantial agreement on the
    architectural questions, how we will see large editions (mega- or
    gigabytes) deployed on server vs. client. As for what ideas I have,
    actually I'd like to pass the ball back to John Bradley (and anyone else)
    to carry on that one, as they have more concrete ideas and hands-on experience.
    I agree with John that the web paradigm we have inherited, for better or
    worse, has tended to limit the options; for the kinds of things we want to
    do, even a university pipe may not be wide enough, to say nothing of those
    of us on 28.8. On the other hand, I also agree with John and David that
    various kinds of chunking/indexing/cross-indexing strategies are feasible,
    and will probably always be necessary in some cases. We will probably see
    the size of integrated resource collections (whether "editions" or not)
    grow along with available bandwidth, so the problem will never disappear
    even as the limits are pushed outwards. As to whether XML per se supports,
    or fails to support, such chunking (particularly if it is to be transparent
    to the user), I think it's safe to say it's neutral: a system could be
    designed either way (and either way might be appropriate in different
    circumstances). A key design issue here is the framing of metadata at
    various levels, and whatever "information inheritance" models are
    implemented to support the chunking while maintaining an integrated view
    (or: how are chunking and indexing to be best interrelated and managed?).
    In this context, for example, I think it's worthwhile to note that XSLT,
    the W3C transformation language supporting XML presentation, is designed
    specifically in order to support a kind of "random access" processing (my
    term, not a term of art for this to my knowledge), that is, "start styling
    the document from any point in the middle". If the language were not
    side-effect-free (one of the features of its processing model), one would
    have to download an entire document before one could style it, as (for
    example) the Panorama browser had to do. This not being necessary with XSL,
    the pipeline itself is not such a bottleneck clients are able to bear more
    I would also concur, however (more agreement here) with David and with
    Francois in his more outlandish post (that's a compliment, Francois!), both
    of whom suggest that the design questions here are really wide open, and
    that we'll be seeing many interesting experiments with peer-to-peer
    deployments, dynamic editions, etc. etc. This is really a brave new world.
    I wonder whether experiments with scholarly publishing in which many
    readers have capabilities (simultaneously?!) to amend or alter texts, have
    been done, how such texts can be framed and deployed, and what results
    we'll see. In some respects I think we'll find it's like the 1960s
    experiments with audience participation in theater: that it is vitalizing
    and enriching, and yet also in some ways, by threatening the formal
    integrity of the medium itself, a real risk: so every performance is either
    brilliant, or a complete bust. For us, it's the line between research and
    publication that blurs (as has been remarked on HUMANIST before), raising
    similar questions.
    But we will not be the only community facing this particular
    architecture/design problem, by any means. Think of Internet-based medical
    informatics, financial services.... we actually have quite a bit to
    contribute here, as David also says.
    Best regards,
    Wendell Piez                            mailto:wapiez@mulberrytech.com
    Mulberry Technologies, Inc.                http://www.mulberrytech.com
    17 West Jefferson Street                    Direct Phone: 301/315-9635
    Suite 207                                          Phone: 301/315-9631
    Rockville, MD  20850                                 Fax: 301/315-8285
        Mulberry Technologies: A Consultancy Specializing in SGML and XML
             Date: Fri, 10 Nov 2000 09:50:21 +0000
             From: "David Halsted" <halstedd@tcimet.net>
             Subject: Re: 14.0482 XML, WWW, editions
    The discussion of scholarly texts and the problems involved in making them
    useful online got me wanting to experiment, and I've written an extremely
    primitive SAX parser (based on Xerces) that reads a set of URLs in from an
    XML file and looks through all of the documents it finds for lines that
    match a string you feed in at the command line.  It's invoked like this:
    java [-cp classpath] ShakesRead [urlsFile.xml] [string_to_find]
    It returns the name of the document, the line number at which the string was
    found, and the line in which the string was found (this version expects the
    content it's looking for to be in a <LINE></LINE> tag pair, but that could
    be made more useful).  Nothing earth-shaking, but it is precisely a kind of
    client that looks for useful information in an arbitrarily large number of
    remote XMLs and tells you where that information is located.  I don't know
    whether anybody would find such a thing useful, but I can imagine some
    potentially useful modifications, like allowing the program to take, say, a
    set of lines in a poem and look for each of the words used there in turn
    across different "libraries" of XMLs, grouped by author, period, genre,
    If anybody wants to try playing around with this program-let, I'd be pleased
    to share it, or I could put a version online somewhere if people are
    interested in trying it out that way.  The main point was to argue that
    clients can already be used to take advantage of online XML, even if the
    documents in question are fairly large -- and also, a bit, that putting XML
    version of your favorite scholarly materials online is worthwhile; now
    people can really use it . . .
    Dave Halsted

    This archive was generated by hypermail 2b30 : 11/10/00 EST