Humanist Discussion Group, Vol. 17, No. 151.
Centre for Computing in the Humanities, King's College London
www.kcl.ac.uk/humanities/cch/humanist/
www.princeton.edu/humanist/
Submit to: humanist@princeton.edu
Date: Tue, 15 Jul 2003 06:35:59 +0100
From: Edward Vanhoutte <evanhoutte@kantl.be>
Subject: Literary and Linguistic Computing - TOC 18/1
Literary and Linguistic Computing -- Table of Contents Alert
A new issue of Literary and Linguistic Computing has been made
available:
April 2003; Vol. 18, No. 1
URL: http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/
-Editorial
Marilyn Deegan, p. 1
- Introduction: New Directions in Humanities Computing
David Robey, pp. 3-9
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180003.sgm.abs.html
- Towards the User: The Digital Edition of the Deutsche Wrterbuch by
Jacob and Wilhelm Grimm
Ruth Christmann and Thomas Schares, pp. 11-22
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180011.sgm.abs.html
Since February 2002, a first version of the Deutsche Wrterbuch (DWB) by
Jacob and Wilhelm Grimm has been available on the web. A CD-ROM beta
version has been available since December 2002. This paper will focus on
the steps involved in drawing up an electronic version of the DWB and,
by demonstrating the design of the Graphical User Interface (GUI), will
show how common standards of digitization were taken into account and
user needs were anticipated during the production process. The history
and structure of the DWB will be outlined first to point out some
characteristics of the dictionary. The process of retrodigitization from
printed page to electronic dictionary will be briefly described and,
while giving an overview of the DWB GUI, the importance of content-based
markup and a user-friendly but powerful GUI as a necessary precondition
for sensible and effective access to the dictionary contents will be
stressed. The title of this paper, Towards the User, can thus be
interpreted in two ways: during the digitization of the DWB, we consider
the needs of the users, and by digitization, we hope to open up this
huge amount of data and lexicological information for researchers.
- The Scottish Corpus of Texts and Speech: Problems of Corpus Design
Fiona M. Douglas, pp. 23-37
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180023.sgm.abs.html
In recent years, the use of large corpora has revolutionized the way we
study language. There are now numerous well-established corpus projects,
which have set the standard for future corpus-based research. As more
and more corpora are developed and technology continues to offer greater
and greater scope, the emphasis has shifted from corpus size to
establishing norms of good practice. There is also an increasingly
critical appreciation of the crucial role played by corpus design.
Corpus design can, however, present peculiar problems for particular
types of source material. The Scottish Corpus of Texts and Speech
(SCOTS) is the first large-scale corpus project specifically dedicated
to the languages of Scotland, and therefore it faces many unanswered
questions, which will have a direct impact on the corpus design. The
first phase of the project will focus on the language varieties Scots
and Scottish English, varieties that are themselves notoriously
difficult to define. This paper outlines the complexities of the
Scottish linguistic situation, before going on to examine the
problematic issue of how to construct a well-balanced and representative
corpus in what is largely uncharted territory. It argues that a
well-formed corpus cannot be constructed in a linguistic vacuum, and
that familiarity with the overall language population is essential
before effective corpus sampling techniques, methodologies, and
categorization schema can be devised. It also offers some preliminary
methodologies that will be adopted by SCOTS.
- A Logic Programming Environment for Document Semantics and Inference
David Dubin, Allen Renear, C. M. Sperberg-McQueen and Claus Huitfeldt,
pp. 39-47
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180039.sgm.abs.html
Markup licenses inferences about a text. But the information warranting
such inferences may not be entirely explicit in the syntax of the markup
language used to encode the text. This paper describes a Prolog
environment for exploring alternative approaches to representing facts
and rules of inference about structured documents. It builds on earlier
work proposing an account of how markup licenses inferences, and of what
is needed in a specification of the meaning of a markup language. Our
system permits an analyst to specify facts and rules of inference about
domain entities and properties as well as facts about the markup syntax,
and to construct and test alternative approaches to translation between
representation layers. The system provides a level of abstraction at
which the performative or interpretive meaning of the markup can be
explicitly represented in machine-readable and executable form.
- Forensic Linguistics: its Contribution to Humanities Computing
Laszlo Hunyadi, Kalman Abari and Enik T[odblac]th, pp. 49-62
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180049.sgm.abs.html
The paper is a report on a case in forensic linguistics in which
linguistic and computational approaches are combined to answer the
question whether it can be proved if a digital recording has been
tampered with. With the growing use of digital applications, the chances
of digital forgery are increasing significantly. Accordingly, the
detection of tampering with audio recordings is also becoming an
important task for forensic linguists. In the given case, we assumed
that the most straightforward way of tampering with the given digital
audio recording might have been the removal of some material and so our
aim was to identify the location of this kind of tampering in the file.
Due to the complexity of the given task the approach presented is
interdisciplinary: first, it uses a traditional semantic analysis to
identify possible discontinuous segments of the recorded text; secondly,
it introduces an experimental phonetic approach to identify cues of the
digital cutting of the audio signal; thirdly, it applies statistical
calculations to specify the bit-level characteristics of audio
recordings. The combination of these measurements proved to be quite
helpful in answering the initial question, and the proposed new
methodologies can be used in further areas of linguistics and
computation.
- The Publication of Archaeological Excavation Reports Using XML
Christiane Meckseper and Claire Warwick, pp. 63-75
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180063.sgm.abs.html
This paper looks at the usability of XML for the electronic publication
of field reports by commercial archaeological units. The field reports
fall into the field of grey literature as they are produced as client
reports by commercial units as part of the planning process and do not
receive official publication or widespread dissemination. The paper uses
a small commercial unit called ARCUS at the University of Sheffield as a
case study and to mark up a sample of excavation report using XML and
the TEI Lite DTD. It also looks at the possibility of incorporating
controlled archaeological vocabulary into the DTD. The paper comes to
the conclusion that the electronic publication of grey reports would be
very useful as it would allow a quicker response time and a rapid
dissemination of information within the fast-moving and changing
environment of commercial archaeology. XML would be a useful tool for
the publication of field reports as it would allow practitioners to
selectively download separate sections of field reports that are of
particular importance to them and to improve the searchability of
reports on the web. It is recognized that national archaeological
institutions will also have to accept electronic versions of field
reports in order for them to be able to be built into the financial
framework of a commercial project design.
- METAe-Automated Encoding of Digitized Texts
Birgit Stehno, Alexander Egger and Gregor Retti, pp. 77-88
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180077.sgm.abs.html
This paper explains why and how the digitization project METAe applies
METS (Metadata Encoding and Transmission Standard) as encoding scheme
for automatically extracted metadata. In contrast to TEI (Text Encoding
Initiative) and other markup languages, METS allows encoding of the
whole range of structural, descriptive, and administrative metadata in a
systematic way. As the METS schema permits the integration of other
existing standards, it provides a highly flexible output that can be
converted easily to the individual needs of digital libraries. An
innovative aspect of the METAe data structure is the ALTO file
('Analysed layout and text object'), which contains the layout
structures as well as the text passages of book pages. Structural maps
of the METS schema are used to compose the logical and the physical
structures out of ALTO and image files.
- Testing Structural Properties in Textual Data: Beyond Document
Grammars
Felix Sasaki and Jens Pnninghaus, pp. 89-100
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180089.sgm.abs.html
Schema languages concentrate on grammatical constraints on document
structures, i.e. hierarchical relations between elements in a tree-like
structure. In this paper, we complement this concept with a methodology
for defining and applying structural constraints from the perspective of
a single element. These constraints can be used in addition to the
existing constraints of a document grammar. There is no need to change
the document grammar. Using a hierarchy of descriptions of such
constraints allows for a classification of elements. These are important
features for tasks such as visualizing, modelling, querying, and
checking consistency in textual data. A document containing descriptions
of such constraints we call a 'context specification document' (CSD). We
describe the basic ideas of a CSD, its formal properties, the path
language we are currently using, and related approaches. Then we show
how to create and use a CSD. We give two example applications for a CSD.
Modelling co-referential relations between textual units with a CSD can
help to maintain consistency in textual data and to explore the
linguistic properties of co-reference. In the area of textual,
non-hierarchical annotation, several annotations can be held in one
document and interrelated by the CSD. In the future we want to explore
the relation and interaction between the underlying path language of the
CSD and document grammars.
- The Versioning Machine
Susan Schreibman, Amit Kumar and Jarom McDonald, pp. 101-107
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180101.sgm.abs.html
This article describes the background and architecture of The Versioning
Machine, a software tool designed to display and compare multiple
versions of texts. The display environment provides for features
traditionally found in codex-based critical editions, such as annotation
and introductory material. It also takes advantage of opportunities
afforded by electronic publishing, such as providing a frame to compare
diplomatic versions of witnesses side by side, allowing for
manipulatable images of the witness to be viewed alongside the
diplomatic edition, and providing users with an enhanced typology of
notes.
- Minutes of the Annual General Meeting of the Association for Literary
and Linguistic
Computing held at Tbingen, Germany on 27 July 2002 pp. 109-111
- Treasurer's Report: Financial year January to December 2002
Jean Anderson, pp. 112-114
http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/180112.sgm.abs.html
--============= Edward Vanhoutte Co-ordinator Centrum voor Teksteditie en Bronnenstudie - CTB (KANTL) Centre for Scholarly Editing and Document Studies Reviews Editor, Literary and Linguistic Computing Koninklijke Academie voor Nederlandse Taal- en Letterkunde Royal Academy of Dutch Language and Literature Koningstraat 18 / b-9000 Gent / Belgium tel: +32 9 265 93 51 / fax: +32 9 265 93 49 evanhoutte@kantl.be http://www.kantl.be/ctb/ http://www.kantl.be/ctb/vanhoutte/
This archive was generated by hypermail 2b30 : Tue Jul 15 2003 - 01:50:13 EDT