Subject: Call for Papers: PKDD'98

PKDD'98 -- 2nd European Symposium on Principles of
Data Mining and Knowledge Discovery
Nantes, France
September 23-26, 1998

Data Mining and Knowledge Discovery in Databases (KDD) have emerged from
acombination of many research areas: databases, statistics,
machinelearning, automated scientific discovery, inductive logic
programming, artificial intelligence, visualization, decision science, and
high performance computing.

While each of these areas can contribute in specific ways, KDD focuses on
the value that is added by creative combination of the contributing areas.
The goal of PKDD'98 is to provide a European-based forum for interaction
among all theoreticians and practitioners interested in data mining.
Interdisciplinary collaboration is one desired outcome, but the main
long-term focus is on theoretical principles for the emerging discipline of
KDD, especially on KDD-specific principles that go beyond each contributing

Both theoretical and applied submissions are sought. Reviewers will assess
the contribution towards the principles of KDD, in addition to the usual
requirements of relevance, novelty, clarity and significance. Applied
papers should go beyond an individual application, presenting an explicit
method that promises a degree of generality within one or more stages of
the discovery process, such as preprocessing, mining, visualization, use
of prior knowledge, knowledge refinement, and evaluation. Theoretical
papers should demonstrate how the proposed theoretical contribution
advances the discovery process.

The following non-exclusive list exemplifies topics of interest:

Data and knowledge representation for data mining
* Beyond relational databases: new forms of data organization
* Data reduction
* Prior domain knowledge and use of discovered knowledge
* Combining query systems with discovery capabilities

Statistics and probability in data mining
* Discovery of probabilistic networks
* Modelling knowledge uncertainty
* Discovery of exceptions and deviations
* Statistical significance in large-scale search
* The problems of over-fit

Logic-based perspective on data mining
* Inference of knowledge from data
* Exploring different subspaces of first order logic
* Rough sets in data mining
* Boolean approaches to data mining
* Inductive Logic Programming for mining real databases
* Pattern-recognition for data mining
* The use of tolerance (similarity) relations in data mining
* KDD-motivated discretization of data
* Discovery of approximate schemes of reasoning from data

Man-Machine interaction in data mining
* Visualization of data
* Visualization of knowledge
* Interface design
* Interactive data mining: human and computer contributions

Artificial Intelligence contributions to KDD
* Representing knowledge and hypotheses spaces
* Search for knowledge and its complexities
* Combining many methods in one system
* Data mining in distributed/multiagent systems

High performance computing for data mining
* Hardware support for KDD
* Parallel discovery algorithms and complexity
* Distributed data mining
* Scalability in high dimensional datasets
* From concept learning to concept discovery
* Expanding the autonomy of machine learners
* Embedding learning methods in KDD systems
* Conceptual clustering in knowledge discovery
* Applications of scientific discovery systems to databases
* Scientific hypothesis evaluation that transfers to KDD
* Hypothesis spaces of scientific discovery applied in KDD
* Differences between the data handled in both fields
* KDD applications on scientific databases
* Decomposition of large data tables

Quality assessment of data mining results
* Multi-criteria knowledge evaluation
* Benchmarks and metrics for system evaluation
* Statistical tests in KDD applications
* Usefulness and risk assessment in decision-making

Applications of data mining and knowledge discovery
* Medicine: diagnosis and prognosis
* Control theory: predictive and adaptive control, model identification
* Engineering: diagnosis of mechanisms and processes
* Public administration
* Marketing and finance
* Data mining on the web in text and heterogeneous data
* Natural and social science
* Prediction and intervention use of knowledge
* Fraud detection

Interaction between symbolic KDD methods and neural nets
* Interpretation of knowledge accumulated in a trained NN
* Hybrid NN/Symbolic KDD systems
* NN architectures for higher transparency and interpretability

Submitted papers should be in English and not exceed 10 single-spaced pages
of 12pt font (excluding title page but including tables, figures and
bibliography). Submissions exceeding this limit will not be reviewed. A
separate title page should begin with title, authors, affiliations, surface
and e-mail addresses, and an abstract of about 200 words. Submitted papers
should preferably be formatted according to the LNAI guidelines. LaTeX and
Word style files are available at
http://www.sciences.univ-nantes.fr/pkdd98/styles. The following items must
be submitted by May 15th, 1998: an electronic version of the paper
(uuencoded and compressed PostScript), and an electronic version of the
titlepage in plain ASCII format. Four hard copies of the paper by regular
mail are also accepted if electronic submission is not possible. All items
should be sent to the following addresses:

* Regular mail: Mohamed Quafafou - PKDD'98 Conference (see full address below)
* Electronic mail : pkdd98@irin.univ-nantes.fr

All accepted for regular and poster presentations will be published by
Springer Verlag as part of the ``Lecture Notes in Artificial Intelligence''
(LNAI) series.

Submission deadline: May 15th, 1998
Notice of acceptance: June 15th, 1998
Camera ready papers: July 5th, 1998

PANEL DISCUSSIONS: proposals are sought for panels that stimulate
interaction between the communities contributing to KDD. Include title,
the main goals, prospective participants and a summary of the topics to be
discussed. Submission to zytkow@uncc.edu by May 15th, 1998. Notification
of acceptance by June 15th, 1998.

TUTORIALS: proposals are solicited for tutorials that: (1) transfer
know-how and provide hands-on experience, (2) combine two or more areas
(e.g. rough sets and statistics, high-performance computing and databases,
etc), or (3) cover application domains such as finance, medicine, or
automatic control. Submission to zytkow@uncc.edu by May 15th, 1998.
Notification of acceptance by June, 15th, 1998.

DEMONSTRATIONS OF SOFTWARE for data mining and knowledge discovery are
invited, including both commercial and experimental systems. Send
descriptions to quafafou@irin.univ-nantes.fr by July 15th, 1998.


Jan Zytkow, Mohamed Quafafou,
Dept. of Computer Science IRIN, 2 rue la Houssiniere
UNC Charlotte BP 92208 - 44322
Charlotte, NC 28223 Nantes cedex 03
USA France
zytkow@uncc.edu quafafou@irin.univ-nantes.fr


Pieter Adriaans (Syllogic, Netherlands)
Pawel Bradzil (U. Porto, Portugal)
Henri Briand (IRIN U. Nantes, France)
Leo Carbonara (British Telecom., UK)
A. Fazel Famili (IIT-NRC, Canada)
Ronen Feldman (Bar Ilan, U. Israel)
Patrick Gallinari (U. Paris 6, France)
Jean-Gabriel Ganascia (U. Paris 6, France)
Attilio Giordana (U. Torino, Italy)
David Hand (Open U., UK)
Bob Henery (U. Strathclyde, UK)
Mikhail Kiselev (Megaputer Intelligence, Russia)
Willi Kloesgen (GMD, Germany)
Yves Kodratoff (U. Paris 11, France)
Jan Komorowski (Norwegian U.Sci. & Tech.)
Nada Lavrac (Josef Stefan Inst., Slovenia)
Heikki Mannila (U. Helsinki, Finland)
Steve Muggleton (Oxford U., UK)
Zdzislaw Pawlak (Warsaw Technical U., Poland)
Gregory Piatetsky-Shapiro (Knowledge Stream, Boston, USA)
Lech Polkowski (U. Warsaw, Poland)
Mohamed Quafafou (IRIN U. Nantes, France)
Zbigniew Ras (UNC Charlotte, USA)
Lorenza Saitta (U. Torino, Italy)
Wei-Min Shen (U. So. California, USA)
Arno Siebes (CWI, Netherlands)
Andrzej Skowron (U. Warsaw, Poland)
Derek Sleeman (U. Aberdeen, UK)
Nicolas Spyratos (U. Paris 11, France)
Shusaku Tsumoto (Tokyo Medical & Dental U., Japan)
Raul Valdes-Perez (CMU, USA)
Thierry Van de Merckt (Belgium)
Rudiger Wirth (Daimler-Benz, Germany)
Stefan Wrobel (GMD, Germany)
Ning Zhong (Yamaguchi U., Japan)
Wojtek Ziarko (U. Regina, Canada)
Djamel A. Zighed (U. Lyon 2, France)
Jan Zytkow (UNC Charlotte, USA)

26 May 1998, Morning Session

Held in conjunction with
The First International Conference on Language Resources and Evaluation
Granada, Spain (28-30 May 1998)


It is essential, for a natural language processing system, to instantiate each
object, process, attribute, and property correctly, so that all references to
the same item be recognized as such and an inventory of all distinct items be
accurate at all times. This problem is far from being resolved. There are both
linguistic and computational reasons for this deficiency. First, there is no
satisfactory microtheory of linguistic coreference. Secondly and
consequently, there is no satisfactory application of such a microtheory to

A microtheory of coreference in natural language includes in its scope all the
phenomena that satisfy the following condition: an object/entity, an event, an
attribute, a property or its value, an attitude, or any combination of the
is referred to more than once in a natural-language text, and the understanding
of the text depends on the correct interpretation of the two or more referring
expressions as designating the same object, event, etc. A linguistic
microtheory of coreference for a language consists of the following elements:
- a complete range of covered phenomena in the language;
- a taxonomy of the range;
- a typology of the range;
- a list of rules forming the various types of coreference;
- a list of rules interpreting the various types of coreference.

There has been a considerable amount of work on a few selected types of
coreference, focusing almost exclusively on object coreference. Thus,
significant work has been done in theoretical linguistics on anaphora and
cataphora, subsuming, for the large part, earlier work on deixis. A small
minority of authors have tried to extend their studies of anaphora beyond mere
syntax. In the cognitive-linguistics and philosophy-of-language traditions,
interesting work has been done relating anaphora and deixis to ambiguity
resolution and discourse structure. At the same time, an effort in
comparative-contrastive linguistics has led some writers to examining the data
of more than one language at a time, still emphasizing entity or object

In computational linguistics, the problem of coreference took early on the form
of pronoun antecedent resolution, and this particular task, somewhat broadened
to include a few other types of anaphora, still remains in the center of the
problem. The most sustained effort in the computational treatment of
has been mounted within the Tipster/MUC-6 initiative. While it has been
recognized since quite early in the game that coreference resolution is
based in
large part on world knowledge, most of the work done on the matter
computationally and theoretically ignores and avoids world
knowledge. The MUC-6 initiative makes such an orientation quite explicit: the
work should be based on such simpler resources as part-of-speech tagging,
simple noun phrase recognition, basic semantic category information like,
gender, number, and [to a limited extent] full parse trees. Such an
approach--trying to explore and maximize everything that can be done
simply and cheaply towards the resolution of a complex program--is
perfectly legitimate as long as it is realized that a considerable part of
the problem remains unsolved, and it is indeed realized fully well within
the MUC-6 initiative.

One persistent problem throughout the existing computational ventures into
coreference has been the lack of a consistent theoretical approach to it. The
result is that coreference phenomena are treated as self-obvious, and most of
them are overlooked, especially if they are not explicit pronoun-antecedent or
other equally evident anaphora cases. What is needed for a full, accurate, and
reliable approach to coreference can be summarized, somewhat schematically, as
involving the following steps:

1. understanding fully the range of the phenomenon and
of the rules that govern it (theory);
2. determining the extent of machine-tractable information
in the rules;
3. taking stock of all the rules that can be computed;
4. developing the appropriate heuristics for the computable rules;
5. computing the rules.


The workshop will be held during the morning session of 26 May 1998 and will
include a joint address by the Organizing Committee (listed above), followed by
5-8 individual presentations in two 90-120-minute blocks, with a break provided
midway through.


The Workshop solicits papers addressing any one or more of the points addressed
above as well as any other pertinent issues.

Papers based on a diversity of languages are encouraged, both one language at a
time and, especially, comparative/contrastive studies. Also strongly encouraged
are papers which extend the study of coreference beyond entity/object
reference, across document boundaries, and/or into non-text media.


Paper submissions should consist of an extended abstract of approximately 800
words, along with a brief description of the proposed presentation structure
(e.g., paper, paper plus demo,etc.).

Each submission should include a separate title page, providing the following
information: the title to be printed in the Conference program; names and
affiliations of all authors; the full address of the primary author (or
alternate contact person), including phone, fax, email; and required
audio-visual equipment.

Papers may be submitted by sending three hardcopies or one softcopy (in TeX,
ASCII, or post-script format) to the appropriate address as listed below:

Dr. Victor Raskin
Chair, Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA


Submissions must be received no later than 1 March 1998 for a 15 March
notification of paper acceptance. (Full versions of all accepted papers are
requested no later than 15 April 1998 for inclusion in the conference


Dr. Sara J. Shelton (Contact Person)
US Department of Defense
9800 Savage Road, R525
Ft Meade, MD 20755 USA
301-688-0301 (voice)
301-688-0338 (fax)

Dr. Eduard Hovy
Information Sciences Institute
University of Southern California
4676 Admirality Way
Marina Del Rey, CA 90292-669 USA
310-822-1511, ext. 731 (voice)

Dr. Victor Raskin
Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA
765-494-3782 (voice)
765-494-3780 (fax)

Subject: DIGITAL PRESERVATION: A new conversation

"TIME AND BITS: Managing Digital Continuity"

As more of the cultural heritage community understands the urgency of
digital preservation issues (how do we save existing digital material that
is already proving to be unreadable and how do we prepare a strategy for
ensuring the long-term availability of material we are now digitizing?) one
group is preparing to expand the conversation beyond the merely technical
and technological.

This week, the Getty Center will host a small group that will open a
discussion on "technology, culture, and time," that will examine the
sociocultural and economic implications of the digital preservation issues.
The ambition of the conversation is to "provide a framework for long-term
digital cultural preservation."

Those included in the conversation include the following:

Howard Besser
Stewart Brand
Doug Carlston
Ben Davis
John Heilemann
Danny Hillis
Brewster Kahle
Kevin Kelly
Jaron Lanier
Peter Lyman
Margaret MacLean
Paul Saffo
Bruce Sterling

This project is being co-organized by the Getty Conservation Institute, the
Getty Information Institute and the Long Now Foundation of San Francisco.
The web site announcing the conversation and the issues will report on the
dialog. It also contains a very useful list of web resources on digital
preservation issues at <http://www.ahip.getty.edu/timeandbits/links.html>.

Below is the introduction to "TIME & BITS" as it appears on the web page.

David Green


"TIME AND BITS: Managing Digital Continuity"

The enthusiastic and increasing use of electronic media for storing
information of various kinds demonstrates the utility of the format and its

In the field of cultural heritage, there is an enormous amount of
significant information in digital form. These data are vulnerable on many
levels. Because of the increasingly fast cycle of obsolescence in hardware
and software, we are at the point where the proliferation of electronic
data on various platforms has prompted some serious concerns about the
long-term protection of the data. A number of international organizations
are examining technological issues that bear on the problem, including data
types, media stability, and options for refreshing and migrating data to
ever-evolving platforms.

There is, however, an important gap in the discussions.

An integrated technical and philosophical discussion of digital archives
and their future that includes the sociocultural and economic implications
of both the problems and the solutions could provide a framework for
long-term digital cultural preservation.

The Getty Conservation Institute and the Getty Information Institute [of
the J. Paul Getty Trust, Los Angeles] are collaborating with the Long Now
Foundation [San Francisco] to generate some strategic thinking on these
issues with important digital theorists. In February of 1998, we will
convene a small group at The Getty Center to share concerns and expertise
in technology, culture, and time.

We will use this Web site to present certain ideas for moderated
discussion, including a summary of the state of the technological work. We
will post comments and incorporate some of them into the body of work being

The on-line discussion and meeting should provide a set of insightful and
responsible recommendations that will chart a thoughtful course for the
resolution of problems related to long-term digital data protection,
preservation, and reconstruction.


David L. Green
Executive Director
21 Dupont Circle, NW
Washington DC 20036
202/296-5346 202/872-0886 fax

