13.0522 text-analysis tools for

From: Humanist Discussion Group (willard@lists.village.virginia.edu)
Date: Sat Apr 01 2000 - 20:07:59 CUT

  • Next message: Humanist Discussion Group: "13.0526 ACM Hypertext 2000 conference"

                  Humanist Discussion Group, Vol. 13, No. 522.
          Centre for Computing in the Humanities, King's College London

      [1] From: Alexander Nakhimovsky <sasha@cs.colgate.edu> (41)
            Subject: Text Analysis Tools for XML documents: a Web

      [2] From: John Dawson <jld1@cam.ac.uk> (17)
            Subject: Re: Text Analysis Tools for XML documents: a Web

            Date: Wed, 29 Mar 2000 19:10:45 -0600
            From: Alexander Nakhimovsky <sasha@cs.colgate.edu>
            Subject: Text Analysis Tools for XML documents: a Web application

    An announcement: Text Analysis Tools for XML documents: a Web application

    The release, last November, of the XSL and XPath Recommendations created a
    new range of possibilities for text-analysis tools. Since January, a
    project at Colgate University in the US has been developing a set of tools
    with the following design goals:

    -- the tools are available over the network as a Web application;
    -- the tools are DTD independent: the user interface is constructed
    automatically on the basis of the document's DTD;
    -- the queries that the tools can process use XPath to express structural
    query conditions and Regular Expressions to describe the text patterns of
    the query;
    -- the tools are extensible: if XSLT cannot do a query, it can be relegated
    to an extension function written in a general-purpose programming language
    (Java most easily);
    -- secondary documents, such as concordances, frequency counts, inverted
    indices and so on, are kept as XML documents, optimized for query
    processing but also available for printing and display.

    We now have an early version of the tools and a tutorial on how to use
    them, both to be found at


    Our main purpose in posting this announcement is to get feedback: what
    other functionality is needed? how can the user interface be improved? We
    are interested in collaborating with an ongoing project to try out ideas.
    There are email addresses at the end of this message. Eventually, we would
    like to make this an open source project.

    The tutorial uses a very simple DTD (Jon Bozak's play.dtd), and a single
    text, The Merchant of Venice. However, the program is DTD-independent.
    The next version of the tutorial will use TEI Light and provide
    instructions on how to use the program with a DTD of your own.

    Both the program and the tutorial have been prepared by Karthik Jayaraman,
    following initial suggestions by Alexander Nakhimovsky. Karthik
    (kjayaraman@mail.colgate.edu) is a senior undergraduate student, and
    Nakhimovsky (sasha@cs.colgate.edu) is a faculty member in the computer
    science department at Colgate. We will be giving a paper on our work at
    XML-Europe in Paris in June. A poster and a software demo will be
    presented at the ALLC/ACH meeting in Glasgow.

    Alexander Nakhimovsky tel 315-228-7586
    Computer Science Dpt fax 315-228-7004
    Colgate University sasha@cs.colgate.edu or
    Hamilton NY 13346 sasha@mail.colgate.edu

            Date: Thu, 30 Mar 2000 11:03:11 +0100 (BST)
            From: John Dawson <jld1@cam.ac.uk>
            Subject: Re: Text Analysis Tools for XML documents: a Web application

    At first sight, very impressive, and very useful.

    When searching a speech for a particular word, neither the
    immediate results, nor the expanded sources, show which part
    of the play they come from.

    A couple of comments:

    (1) If I search SPEECH for 'trip' I get one match, a speech by JESSICA.
    Clicking on the ellipsis shows the complete scene, but doesn't say which
    Act it's in.

    (2) It would be a good idea to highlight the words searched for in the
    results (with colour, preferably), as if a complete speech is chosen as
    the context, this can be quite long, and difficult to spot the chosen

    Thanks. John
    John Dawson work: JLD1@cam.ac.uk home: JLDawson@talk21.com
                         (01223) 335029 (01462) 893410
                  web: http://www.cus.cam.ac.uk/~jld1

                           Humanist Discussion Group
           Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>

    This archive was generated by hypermail 2b29 : Sat Apr 01 2000 - 20:17:43 CUT