8.0115 Prelimiary Program: Applied NLP October 1994 (1/722)

Sun, 24 Jul 1994 20:31:44 EDT

Humanist Discussion Group, Vol. 8, No. 0115. Sunday, 24 Jul 1994.

Date: Wed, 20 Jul 94 15:10:18 EDT
From: pjacobs@unagi.cis.upenn.edu (Paul Jacobs)
Subject: Preliminary program/info for ANLP-94

Preliminary Program and Registration Information
4th Conference on Applied Natural Language Processing
Stuttgart, Germany
October 13-15, 1994

sponsored by
Association for Computational Linguistics


Like previous ACL Applied conferences, this meeting will bring
together researchers and developers from around the world to focus on
the application of natural language processing to real problems. The
program will include invited and contributed papers, an industrial
exhibition, and demonstrations. This year's conference will aim
especially to promote participation from both industry and academia
and to feature work with potential business impact.


Susan Armstrong, ISSCO
Harry Bunt, Tilburg University
Jim Cowie, NMSU/CRL
Ido Dagan, AT&T Bell Labs
Robert Ingria, BBN
Paul Jacobs, GE (Chair)
Richard Kittredge, Univ. of Montreal
Kazunori Muraki, NEC
Peter Norvig, Sun Microsystems
Hans Joachim Novak, IBM
Martha Palmer, Univ. of Penn.
Manny Rayner, SRI
Donia Scott, Univ. of Brighton
Oliviero Stock, IRST
Annie Zaenen, Xerox

Uwe Reyle, Univ. of Stuttgart
Christian Rohrer, Univ. of Stuttgart

Uwe Reyle, Univ. of Stuttgart


Thursday, October 13

9.00-9.45 Registration

9.45-10.00 Conference Opening

10.00-11.00 Text Generation

Bilingual Generation of Job Descriptions from Quasi-Conceptual Forms
D. Caldwell and T. Korelsky
CoGenTex, Inc.

Practical Issues in Automatic Documentation Generation
K. McKeown, K. Kukich and J. Shaw
Columbia University, Bell Communication Research

11.00-11.30 Break

11.30-12.30 Document Image Understanding

Language Determination: Natural Language Processing from Scanned Document Images
P. Sibun and A. L. Spitz
Fuji Xerox Palo Alto Laboratory

Content Identification from Image
T. Nakayama
Fuji Xerox Palo Alto Laboratory

12.30-14.00 Break

14.00-15.30 Machine Translation (Methods)

Machine Translation of Sentences with Fixed Expressions
N. Katoh and T. Aizawa
NHK Science & Technical Research Laboratories

TERMIGHT: Identifying and Translating Technical Terminology
I. Dagan and K. Church
AT&T Bell Laboratories

Symmetric Pattern Matching Analysis for English Coordinate Structures
A. Okumura and K. Muraki
NEC Corp. Information Technology Research Lab

15.30-16.00 Break

16.00-17.30 Tagging models

Tagging Accurately - Don't Guess if you Know
P. Tapanainen and A. Voutilainen
Rank Xerox Research Centre

Does Baum-Welch Re-estimation Help Taggers?
D. Elworthy
Sharp Laboratories of Europe Ltd.

Improving Language Models by Clustering Training Sentences
D. Carter
SRI International

17.45-19.15 Demonstrations and Videos

Friday, October 14

8.30-9.30 Invited Talk - To Be Announced

9.30-10.00 Break

10.00-12.00 Text processing

Probably Correct Inference Rules and Linguistic Annotation
S. Finch
University of Edinburgh

Combination of Symbolic and Statistical Approaches for Grammatical Knowledge
M. Kiyono and J. Tsujii
Univ. of Manchester Institute of Science & Tech.

Adaptive Sentence Boundary Disambiguation
D. Palmer and M. Hearst
University of California, Berkeley

Acquiring Knowledge from Encylopedic Texts
F. Gomez, R. Hull and C.Segami
University of Central Florida

12.00-13.00 Demonstrations and Videos

13.00-14.00 Break

14.00-15.30 Machine Translation (Systems)

A Successful Case of Computer Aided Translation
M. Filgueiras
Universidade do Porto

Three Heads are Better than One
R. Frederking and S.Nirenburg
Carnegie Mellon University

Real-Time Spoken Language Translation using Associative Processors
K. Oi, E. Sumita, O. Furuse, H. Iida and T. Higuchi
ATR and Electrotechnical Laboratory

15.30-16.00 Break

16.00-17.00 Robust parsing

Yet Another Chart-Based Technique for Parsing Ill-Formed Input
T. Kato
NTT Network Information Systems Laboratories

Recycling Terms into a Partial Parser
C. Jacquemin
IUT de Nantes

17.15-19.15 Posters

Improving Chinese Tokenization with Linguistic Filters on Statistical Lexical
D. Wu and P. Fung
University of Science & Technology and Columbia University

Reference Resolution in Newspaper Articles
T. Wakao
University of Sheffield

Automatic Acquisition of Semantic Attributes for User Defined Words in Japanese
to English Machine Translation
S. Ikehara, S. Shirai, A. Yokoo, F. Bond and Y. Omi
NTT Network Information Systems Labs

Degraded Text Recognition using Word Collocation and Visual Inter-word
T. Hong and J. Hull
State University of New York at Buffalo

Using Syntactic Dependencies for Word Alignment
F. Debili, E. Sammouda and A. Zribi

English Adverb Generation in Japanese to English Machine Translation
K. Ogura, F. Bond and S. Ikehara
NTT Network Information Systems Labs

A Practical Evaluation of an Integrated Translation Tool during a Large Scale
Localisation Project
R. Schaeler
University College Dublin

Spelling Correction in Agglutinative Languages
K. Oflazer and C. Guzey
Bilkent University

Integration of Example-based Transfer and Rule-based Generation
S. Akamine, O. Furuse and H. Iida
ATR Interpreting Telecommunications Research Laboratories

An Evaluation of a Method to Detect and Correct Erroneous Characters in
Japanese Input through an OCR using Markov Models

T. Araki, S. Ikehara, N. Tsukahara and Y. Komatsu
Fukui University and NTT Communications Science Laboratories

Multifunction Thesaurus for Russian Word Processing
I. Bolshakov
Russian Academy of Science

Representing Knowledge for Planning Multisentential Text
J. Coch and R. David

Guided Sentence Composition for Disabled People
R. Pasero, N. Richardet and P. Sabatier

An Interactive Rewriting Tool for Machine Acceptable Sentences
H. Hirakawa, K. Nomura and M. Nakamura
Toshiba Corporation

TECHDOC: Multilingual Generation of Online and Offline Instructional Text
D. Roesner and M. Stede

An Inheritance-based Lexicon for Message Understanding Systems
L. Cahill
University of Sussex

Industrial Applications of Unification Morphology
G. Proszeky

Sublanguage Engineering in the FOG System
R. Kittredge, E. Goldberg, M. Kim and A. Polguere
Universite de Montreal, Env. Canada, CoGenTex, Inc. and
National University of Singapore

20.00 Banquet

Saturday, October 15

8.30-9.30 Invited Talk - To Be Announced

9.30-10.00 Break

10.00-12.00 Interface Applications

Resolving Anaphora in a Portable Natural Language Front End to Databases
F. Barros and A. DeRoeck
University of Essex

Upholding the Maxim of Relevance During Patient-Centered Activities
A. Gertner, B. Webber and J. Clarke
University of Pennsylvania

The Delphi Natural Language Understanding System
M. Bates, R. Bobrow, R. Ingria and D. Stallard
BBN Systems and Technologies, Inc.

Understanding Location Descriptions in the LEI System
D. Chin, M. McGranaghan and T. T. Chen
University of Hawaii

12.00-13.00 Demonstrations and Videos

13.00-14.00 Break

14.00-15.30 Lexical Processing

Tagging and Morphological Disambiguation of Turkish Text
K. Oflazer and I. Kuruoz
Bilkent University

A Robust Category Guesser for Dutch Medical Language
P. Spyns
Katholieke Universiteit Leuven

Handling Japanese Homophone Errors in Revision Support System for Japanese
M. Oku
NTT Network Information Systems Laboratories

15.30-16.00 Break

16.00-17.30 Text categorization and retrieval

A Probabilistic Model for Text Categorization: Based on a Single Random
Variable with Multiple Values
M. Iwayama and H. Tokunaga
Hitachi Ltd.

Robust Text Processing in Automated Information Retrieval
T. Strzalkowski
New York University

May a Semantic Lexicon Support Hypertextual Authoring?
R. Basili, F. Grisoli and M. Pazienza
Universita di Roma

TUTORIALS (Oct. 11-12)

Corpus Hacking
Mats Rooth and Oliver Christ, University of Stuttgart


We will introduce a family of programs for representing and computing
with text corpora in a Unix/C environment. The first session will be
devoted to representation of the corpus and linguistic markup and to
an associated query language. In the second session, we will look at
statistical computations and applications to linguistic and
computational linguistic problems.

MATS ROOTH obtained a Phd in linguistics from UMass Amherst after
studying mathematics at MIT. He has worked at CSLI/Stanford, AT&T
Bell Labs, and the Universities of Stuttgart and T\"ubingen. His
research interests include statistical parsing, the semantics
of intonation, and methodologies for employing corpus data in
theoretical linguistics.

OLIVER CHRIST studied computer science at the University of Stuttgart,
Germany. His thesis work, finished in late 1992, dealt with the design and
implementation of a CLIM-based graphical user interface for the TFS system.
Since late 1992, he is working as a research assistant in a project which aims
at the development of tools for the exploration of large text corpora, where he
developed numerous corpus management and access tools.

Partial Parsing
Steven Abney, University of T\"ubingen


Efficient, accurate parsing of unrestricted text is not within the
reach of current techniques. Standard algorithms are too expensive
for use on very large corpora, and relatively fragile. Partial
parsing aims to buy speed and robustness of processing by sacrificing
depth of analysis. Partial parsing can be seen as an application of
the principles that motivate stochastic tagging. Namely, tagging
illustrates how low-level processing can be sliced out of the parsing
problem and solved independently; shallow parsing represents the
``next slice''. Partial parsing is generally useful as
a preprocessing step, either for bootstrapping---extracting
information from corpora for use by more sophisticated parsers---or
for end-user applications such as data extraction.

In the tutorial, we will discuss partial-parsing methods, including
finite-state recognition, cascaded finite-state recognizers, and
HMM's. Generally, techniques used in tagging can be readily
applied to shallow parsing: in addition to HMM's, regression
techniques, including regression trees, are applicable. We will also
touch on grammatical inference techniques, and techniques for
recognizing low-level phrases without grammars, on the basis of
word-level statistical properties such as mutual information.

Finally, we will discuss methods for assembling low-level phrases into
complete parse-trees. To do so, we require something like case-frame
relations. If a domain model provides semantic frames, it is possible
to do semantic interpretation directly on a stream of low-level
phrases, making partial parsing useful as a technique for cleaning up
after traditional parsers. Where domain models are not available,
methods have been developed for inducing syntactic and semantic frames
from a corpus, using partial parsing as a preprocessing step.

STEVEN ABNEY; PhD, MIT Department of Linguistics, 1987.
1987-1993: Member of Technical Staff, Bell Communications Research.
1993-present: Assistant Professor, Computational Linguistics,
University of T\"ubingen. Areas of research: parsing unrestricted
text, stochastic methods, psycholinguistic modelling, phrase

Machine Translation
Louisa Sadler, University of Essex UK


The goal of achieving high quality automatic
translation has long provided an impetus for work in NLP.
There has been much activity in the field in recent years, with a
number of developments (such as the use of statistical or mixed
approaches) promising significant progress in the development of practical
working systems.

This tutorial is directed towards those who would like to be made
aware of current research in Machine Translation. The focus will
mainly be on the architecture of machine translation systems,
surveying the major current approaches (rule-based, statistical,
mixed), although issues such as controlled input, user interaction,
translation aids, multilingual generation and the evaluation of
MT systems will also be touched~on.

In looking at approaches based on the formulation of explicit
linguistic rules,
we will start by considering the traditional distinction between
interlingual and transfer. We will consider the issue of how an
interlingua may be defined and the problems this raises, looking at
some proposed interlinguas. We will briefly examine traditional
transfer systems, focussing on the problem of how (or whether)
bilingual equivalences can be established, before discussing more
recent proposals permitting a more flexible view of
translational equivalence (flexible transfer, (multilingual) type
hierarchies, translation by
abduction, correspondence and negotiation).

We will also look at statistical and mixed (hybrid) approaches to MT
(translation by analogy, example based translation, etc), considering,
inter alia issues such as quality of translation, robustness and
the acquisition and use of large data sets in such systems. This part
of the tutorial will also briefly review work on the automatic
acquisition of terminological, lexical and grammatical resources for

LOUISA SADLER teaches Computational Linguistics and syntax at the
University of Essex UK. She has worked on a number of MT and related
projects since 1985 and
is currently interested in flexible and correspondence based approaches to
MT. She is author/co-author of a number of articles
and a recent introductory book on MT.

Context, Information Structure, Focus and Ellipsis
Stephen Pulman, SRI Cambridge and University of Cambridge


This tutorial will examine some recent approaches to the interpretation
of constructs that are sensitive to context and information structure,
in particular intonational focus, focus-sensitive particles, and

I will describe some influential linguistic theories of ellipsis and
focus, and also survey some recent computationally inspired approaches
using notions like `higher order unification', discourse grammar and
`most specific common denominators'

Finally, I will look at how some of these theories might be implemented
so as to achieve reasonable analysis coverage of sentences involving
ellipsis or focus. I will also look at how to generate sentences
involving ellipsis or focus in appropriate contexts.

STEPHEN PULMAN is a lecturer at the University of Cambridge Computer
Laboratory and is Director of SRI International Cambridge Computer
Science Research Centre. His current research interests are in
computational semantics and dialogue in the context of spoken language
understanding systems.

NLP meets Multimedia: Coordinating Language, Graphics, and Gestures

Wolfgang Wahlster and Elisabeth Andr'e, German Research
Center for AI (DFKI) Saarbr\"ucken


The goal of this tutorial is to survey a new generation of intelligent
multimedia human-computer interfaces with the ability to interpret
some forms of multimedia input and to generate coordinated multimedia
output. The tutorial is organized into four sections: from images to
text, from text to images, coordinating gestures and language, and
integrating multiple media in adaptive presentation systems.

Over the past years, researchers have begun to explore how to translate visual
information into natural language. A great practical advantage of natural
language image description is the possibility of the application-specific
selection of varying degrees of condensation of visual information. There are
many promising applications in medical technology, remote sensing, traffic
control and other surveillance tasks.

Work in the inverse direction, the generation of images from natural
language text, has shown how a physically based semantics of motion
verbs and locative prepositions can be seen as conveying spatial,
kinematic and temporal constraints, thereby enabling a system to
create an animated graphical simulation of events described by natural
language utterances. There is an expanding range of exciting
applications for these methods such as advanced simulation,
entertainment, animation and CAD systems.

The use of deictic gestures parallel to verbal descriptions is of
great importance for multimedia interfaces, because it simplifies and
speeds up reference to objects in a visual context. However, natural
pointing behavior is possibly ambiguous and vague, so that without a
careful analysis of the discourse context of a gesture there is a high
risk of reference failure. We will discuss the state of the art of gesture
interpretation and generation and show how explicit meanings can be
given to pointing behavior in terms of a formal semantics of
the visual world.

In the fourth section of this tutorial, we will present a new generation of
intelligent multimedia systems that goes beyond the standard canned text,
predesigned graphics and prerecorded images and sounds typically found in
commercial multimedia systems of today. Intelligent multimedia presentation
systems include a number of key processes: content planning (determining what
information should be presented in a given situation), medium selection
(apportioning the selected information to text and graphics), presentation
design (determining how text and graphics can be used to communicate the
selected information), and coordination (resolving conflicts and maintaining
consistency between text and graphics). We will show that it is possible to
adapt many of the fundamental concepts developed to date in computational
linguistics in such a way that they become useful for text-picture combinations
as well. We will address key applications such as multimedia helpware,
information retrieval and analysis, authoring, training, monitoring, and
decision support.

WOLFGANG WAHLSTER is a Professor of Artificial Intelligence in
the Department of Computer Science at the University of Saarbr\"ucken,
Germany where he currently serves as a Scientific Director of DFKI. He received
his diploma and doctoral degree in computer science from the
University of Hamburg. Since 1975 he has been working in the field as
a principal investigator in various natural language projects,
VERBMOBIL. He has published more than 150 technical papers on natural
language processing and AI. His current research includes intelligent
multimodal interfaces, user modeling, natural language scene
description, intelligent help systems, deductive plan recognition, and
speech translation. He is a AAAI Fellow and a recipient of the Fritz
Winter Award for his research on cooperative user
interfaces. Prof. Wahlster served as the Conference Chair for IJCAI-93
in Chambery and the Chair of the Board of Trustees of IJCAII from 1991
- 1993. He is currently the Chair of the Association of German AI
Institutes (AKI).

ELISABETH ANDR\'E studied computer science at the University
of Saarbr\"ucken, Germany. Her thesis work dealt with the generation
of natural language scene descriptions in the project VITRA. Since
1988, she has been working as a research scientist in the Intelligent
User Interfaces group at DFKI on the WIP and PPP projects. Her current
research focuses on multimedia communication, intelligent user
interfaces and knowledge-based presentation systems. She is the author
of over 40 scientific papers on natural language generation and
multimedia communication. In January 1994, she was elected European
Representative of the ACL Special Interest Group on Multimedia
Language Processing (SIGMEDIA).


To make your registration and/or hotel reservation, please complete the
registration form (available through ACL LISTSERV, see below) and send
it to:

Sabine Schmid
Institut f\"ur Maschinelle Sprachverarbeitung
Universit\"at Stuttgart
Azenbergstr. 12
70174 Stuttgart
phone: +49-711-121-1379
fax: +49-711-121-1366

Registration includes one copy of the conference proceedings. People
who did not pay ACL-membership for 1994 have to register at the
non-member rate. They will automatically get ACL-membership for 1995.
The same holds for students, who pay 50%.

Registration fees
before 20 August DM 530 (ACL members) DM 580 (non-members)
after 20 August DM 660. (ACL members) DM 720 (non-members)

Payments must be made in DM, either by cheque or by bank
transfer to:
Universit\"at Stuttgart
Baden-W\"urttembergische Bank
Konto-Nr. 1 054 611 700
BLZ 600 200 30

with the following remark:
Institut f\"ur Maschinelle Sprachverarbeitung
Kapitel 1418
Titel 282 86
BANR 4082

Early registration must be received by August 20.


All tutorials are two times three hours.

Wahlster/Andr\'e overlaps with Pulman, and Rooth/Christ overlaps with
Abney. All others are temporally disjoint.

The early registration fee for each
tutorial is DM 170.-, late registration or registration at the
conference DM 200.-.

Please note that only people who register for the conference will be
eligible to take part in the tutorials.


The (optional) conference banquet will be held on Friday, October 14 at 20.00.
The cost will be DM 65.-.


A number of hotels has been reserved for the conference. The Holiday
Inn Garden court will be our primary conference hotel. It is located
in the middle of the Weilimdorf Business Park and is connected to
downtown Stuttgart by tube (about 10 minutes). Further rooms have
been reserved in hotels at a walking distance to the Haus der
Wirtschaft. Furthermore there are beds available in a guest house for
students. Two students have to share one room.

Location Rooms (single) Rate/night
Holiday Inn 200 DM 100.-
Maritim 50 DM 259.-
Others 110 DM 115.- to 140.-
Guest House 40(double) DM 30.-


Stuttgart is the state capital of Baden-W\"urttemberg in south-west
Germany. It was founded about 1000 years ago as ''Stuotgarden'', a stud
farm, and today it is the cultural and commercial centre of the state, with
almost 600 000 inhabitants. For centuries the city was the residence of the
dukes and kings of W\"urttemberg, and this epoch is much in evidence as you
stroll through the city centre.

One of the buildings of this epoch is the ''Haus der Wirtschaft'', the
primary ANLP-94 site. It is located just on the edge of the campus and is
just 5-minutes' walk to the city centre. Exhibitions, software
demonstrations and book exhibits will also be located at the ``Haus der
Wirtschaft''. Tutorials will be held on campus at Keplerstrasse 17. Net
access for participants will also be available there.


Sabine Schmid / Sybille Laderer
Institut f\"ur Maschinelle Sprachverarbeitung
Universit\"at Stuttgart
Azenbergstr. 12
70174 Stuttgart
phone: +49-711-121-1379
phone: +49-711-121-1363
fax: +49-711-121-1366
e-mail: sabine@ims.uni-stuttgart.de


Conference information and registration forms are available in
ASCII and PostScript formats through ACL LISTSERV.

LISTSERV is a facility to allow access to an electronic
document archive by electronic mail. The ACL LISTSERV has been set up
at Columbia University's Department of Computer Science. Requests from
the archive should be sent as e-mail messages to


with an empty subject field and the message body containing the
request command. The most useful requests are "help" for general help
on using LISTSERV, "index acl-l" for the current contents of the ACL
archive and "get acl-l <file>" to get a particular file named <file>
from the archive. For example, to get an ACL membership form, a
message with the following body should be sent:

get acl-l membership-form.txt

Answers to requests are returned by e-mail. Since the server may have
many requests for different archives to process, requests are queued
up and may take a while (say, overnight) to be fulfilled.

The ACL archive can also be accessed by anonymous FTP. Here is an
example of how to get a file by FTP (user typein is underlined):

$ ftp ftp.cs.columbia.edu
Name (ftp.cs.columbia.edu:pereira): anonymous
Password:pereira@research.att.com << not echoed
ftp> cd acl-l
ftp> cd Appliedacl94
ftp> get regform.ps.Z
ftp> quit
$ uncompress regform.ps.Z


Judith Klavans (ACL)
Columbia University
Computer Science
New York
NY 10027
phone/fax: +1-914-478-1802
e-mail: acl@cs.columbia.edu