5.0505 ARTFL Newsletter Winter 1991-92 (1/527)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Thu, 5 Dec 1991 17:49:44 EST

Humanist Discussion Group, Vol. 5, No. 0505. Thursday, 5 Dec 1991.

Date: Thu, 5 Dec 91 15:22:43 CST
From: Mark Olsen <mark@gide.uchicago.edu>
Subject: ARTFL News

The ARTFL Project Newsletter
Volume 7, Number 1 - Winter 1991-92

American and French Research on the
Treasury of the French Language

ARTFL is a cooperative project between:
Centre National de la Recherche Scientifique
The University of Chicago

ARTFL Preparing CD-Rom of Database:
Consortium Members to Receive Copies

With the completion of a new compact disk version of
the Tre'sor de la Langue Franc+aise database, ARTFL will
soon become available to run locally at any of our subscrib-
ing institutions. With support from the Scaler Foundation,
ARTFL is developing a CD-ROM (Compact Disk-Read Only Memory)
which combines a copy of the database and access software.
All members of the ARTFL consortium will be eligible to
receive the CD-ROM, which can run on any computer with UNIX

The CD-ROM will allow ARTFL subscribers to conduct sim-
ple database queries locally, eliminating the need to use
Internet or a long-distance phone link. The ARTFL Project
in Chicago will continue to provide institutional support,
as well as more sophisticated text analysis capabilities.

The ARTFL CD-ROM will contain all of the texts in the
database that are exempt from copyright restrictions (gen-
erally pre-1925), along with complete indices. The search
engine and interface software will run on a variety of UNIX
workstations, including SUN SPARCStations and NeXT comput-

The CD version of the data base has required modifica-
tions to the existing system: a new index structure
appropriate to the compact disc; compression of both the
indices and the data to fit the CD; and the development of a
high speed decoder. The texts alone take up 750 megabytes
with the current indices taking about 300 megabytes. We have
been able to compress the texts to about 225 megabytes using
a new data compression technology developed by Drs. Book-
stein and Klein at the Center for Information and Language
(CILS) of the University of Chicago.

The ARTFL CD-ROM will be provided as part of institu-
tional subscriptions under a license agreement. ARTFL is
not able to sell the CD to institutions or individual
users. Release of the disc is planned for early 1992.

December 5, 1991

- 2 -

French Ambassador Visits ARTFL

In November 1990, M. Jacques Andreani, French Ambassa-
dor to the United States, visited the ARTFL Project at the
University of Chicago. M. Andreani and his wife received a
tour of the ARTFL facilities from Professor Robert Mor-
rissey, who described the Project's continuing expansion of
research services. Professor Morrissey also demonstrated
the PhiloLogic system, showing how he is using the Tre'sor
de la Langue Franc+aise in his current work on the image of
Charlemagne in French literature.

ARTFL also wishes to acknowledge a generous gift from
M. Daniel Ollivier, attache' culturel at the Consulate Gen-
eral of France in Chicago, to support ARTFL's efforts to
stimulate interest in French culture and language. As the
result of collaboration between the University of Chicago
and the Centre National de Recherche Scientifique, ARTFL has
received invaluable moral and financial backing from the
French government. We deeply appreciate these contributions.

NEH Grant Supports Corpus Revision

What? The first chapter of L'Etranger has no para-
graphs? Oh! The ARTFL people just didn't put them in!

At some time or another, most ARTFL subscribers have
probably been frustrated by the formatting limitations of
the texts on the database: no indication of paragraph
breaks, obvious spelling errors, at times no differentiation
between speech and stage directions in theatrical texts...in
the first years after the database was released for use by
American universities, users gradually identified numerous
formatting and accuracy problems. Since the database was
originally collected for the creation of a dictionary in the
1960's, many of the formatting and coding conventions used
for full text retrieval were not included. Now, thanks to a
grant from the National Endowment for the Humanities, ARTFL
has begun to systematically correct many of these problems,
and will continue making corrections until the entire data-
base has been reviewed for errors.

The Corpus Revision project is a process of re-viewing
each text on the database for a number of possible correc-
tions and additions. Areas being addressed include:

1.) spelling, typographic, and other lexical errors;

2.) addition of markers for breaks between sentences and

December 5, 1991

- 3 -

3.) for theatrical texts, markers to distinguish speaker
shifts and stage directions from spoken text;

4.) for verse texts, markers to distinguish individual
poems within large poetry collections.

While the database endeavors to supply researchers with
completely accurate textual and linguistic samples, ARTFL
has been hindered by deficiencies stemming from the initial
stages of the database's development. We needed a large
grant in order to make corrections consistently across the
entire database, and in May 1990 we received that support in
the form of a $180,000 grant from the National Endowment for
the Humanities. With the assistance and close collaboration
of the Institut Nationale de la Langue Franc+aise in Nancy
and Paris, ARTFL has begun correcting the spelling and for-
mat of all 1760 texts in the Tre'sor de la Langue Franc+aise
corpus. The support from the NEH will result in signifi-
cantly more accurate and useful data for researchers in
North America and in Europe.

In the summer of 1990, the projected three-year project
began with a three-month preparation period, during which
software was developed to perform automatic revisions, a
computer based revision, control, and back-up system was
implemented, and finally a local administration was esta-
blished which oversees procedures, training, and text
acquisition. When the preparation period was completed,
ARTFL staff members were trained in using the revision pro-
grams, and their preliminary editing experimentation led to
several improvements in the software. At the same time, the
editing software as originally conceived for prose texts was
adapted for theater and verse texts.

Staff members have resolved the areas of difficulty
outlined above, as well as other less apparent problems
specific to certain texts, using computerized revision
software along with a systematic procedure of manual review.
In order to verify that the format of each computerized text
accurately reflects the format of the text in its printed-
book manifestation, staff members have made the effort to
check the computerized text against the same printed edition
that was originally used to enter the text into the data-
base. In this way, we ensure that the texts' formats are
consistent with each other and with an identifiable printed

ARTFL staff members have now begun editing in earnest,
starting with the simplest material (20th century prose
works), and working back to earlier, more problematic texts.
So far in 1991, ARTFL editors have completed revision of
over 300 texts, including the majority of 20th century prose
works and a good portion of 20th century theater. It is
expected that the revision project will be completed within

December 5, 1991

- 4 -

three years.

Subscribers should note that ARTFL's research availa-
bility will not be affected by the database upgrade project.
In addition, the texts currently accessible to ARTFL sub-
scribers will reflect no revisions until later in the pro-
ject, when the corrected copies of the texts can replace the
current versions of the texts in large, systematically
determined groups.

Besanc+on Corpus Added to Database

They entered thousands of French texts, and no Racine?!

Some of the specific authors and works that are still
not represented on the database cause just as much amazement
among users as the number of texts to which the database
does allow access. But now many of these obvious lacunae in
the database will be filled with the addition of the
Besanc+on corpus, a group of 147 texts featuring, among oth-
ers, the complete works of Racine, Corneille and Molie`re.

This new corpus will vastly expand our holdings from
the 16th and 17th centuries, as well as adding a few impor-
tant works from the 19th and 20th centuries. The majority
of this material has been included in the ARTFL CD-ROM, to
be released this winter, and will be incorporated in the
Chicago database in the Spring. After reformatting and
extensive editing, they are now ready to be added to the

Scholars of the 17th century will be delighted to have
access to the complete plays of Corneille, Racine and
Molie`re, as well as works by De Viau, Mairet, Schelandre,
Tristan L'Hermite, Rotrou, and Madame de Lafayette. Those
interested in the 16th century will now have access to a
selection of texts by Rabelais, Marguerite de Navarre, Ron-
sard, Garnier, Des Massures, Sce`ve and d'Aubigne'.

The Besanc+on corpus also contains texts that round out
present holdings in 19th and 20th century poetry and prose.
These include works by Hugo, Gautier, Nerval, Mallarme',
Baudelaire, Verlaine, Apollinare, and Camus, among others.

Preparing usable versions of the texts demanded a major
effort from the ARTFL staff. The Besanc+on corpus was
developed independently from the rest of the database, and
so its format was different in many respects, and thus not
immediately compatible with our searching software. In 1988,
work began in Chicago to restructure the data so that it
would conform to the format of the rest of the texts on the
database. Each of the 147 works required individual

December 5, 1991

- 5 -

treatment; depending on the state of the individual text,
the ARTFL staff would have to add page numbers, chapter
headings, scene breaks, and/or capitalization for proper
nouns. At present, 129 texts have been treated and are
found on the ARTFL CD-ROM; we expect to have the remaining
texts completed within a few months.


Good news for those of you who are having problems
reaching ARTFL due to incompatible network hook-ups: the
ARTFL database is now accessible over Bitnet and other major
computer networks through the Mail Order Philis Server
(MOPS) . The system will give consortium members free
access to ARTFL by electronic mail, supplementing our Philo-
Logic and MacPhilo user interfaces.

MOPS provides a major addition to ARTFL capabilities:
unlike PhiloLogic, which runs interactively, MOPS lets users
submit search requests to ARTFL by electronic mail. This
eliminates the need for a link to Internet, or for expensive
long-distance phone charges: thus making ARTFL available to
a much wider community of users.

Users can send either one or several requests at a
time, following a simple procedure. (See sidebar) MOPS
automatically translates each query to the ARTFL database,
and the results are mailed back to the electronic address
from which the original query was sent. Turn around time
varies depending on network load and the number of users on
the ARTFL computers, but results can typically be expected
within a few hours. We have tested the system extensively
during the Spring and Summer of 1991, and have been able to
provide results to clients world-wide.

Use of MOPS is free of charge, but you must first
register with ARTFL in order to gain access to the system.
There are certain limitations on searches that may be con-
ducted with MOPS. It cannot handle queries producing
results too large to mail as a single message; also, copy-
right restrictions limit the searches that can be performed
on some portions of the database.

For additional information, contact ARTFL by e-mail at
artfl@artfl.uchicago.edu, or by phone at (312) 702-8488.

A Quick Tour of MOPS

Database queries to MOPS must be written in a precise nota-
tion, but we have tried to keep the format as straightfor-
ward as possible. Each query must provide four pieces of

December 5, 1991

- 6 -

information, each listed on a separate line:

(1) The name of the databaseQgenerally tlf (Tre'sor de la
Langue Franc+aise).

(2) The subcorpus of works you want to search. This is
specified by date, author, or title.

(3) The wordlist construction. This may be a single word,
a range of words containing a common root, or instances
where two or more words cooccur within a specified range of
text. Alternatively, the user may construct a larger or
more varied wordlist, according to the demands of his or her
own research.

(4) The output format. You can list the results as a
keyword-in-context, which identifies the locations of all
the word's occurrences, and displays a single line of text
around the word, or in the complete sentences or paragraphs
in which they appear, or by count, which simply displays the
number of occurrences of the target word in your subcorpus.

Here are a few examples of queries:

date 1820
find ami
context 80

This command produces a keyword-in-context (80 characters in
length) listing of all occurrences of ami in the database
for the year 1820. We could also ask about the feminine and
plural forms, for the whole first half of the century:
date (1800:1849)
find (ami, amie, amis, amies)

Here we have also asked only for the number of such
occurrences, rather than to see the actual text. Finally,
for a more interesting example, here's how you would ask to
see the text of sentences that Gide wrote containing both
the words socie'te'S and franc+ais[e[s]] (in this case, the
results would be returned with 5 lines of context around
each occurrence):

tlf author gide find societe consider sentences find (expand
francais*) concordance 2

The features of MOPS are too extensive to be fully illus-
trated here. In fact, they offer somewhat more flexibility
than PhiloLogic itself (though not the advantage of

December 5, 1991

- 7 -

immediate results in an interactive format).

Scaler Foundation Funds Addition of Encyclope'die

In ARTFL's continuing effort to expand the holdings of
the database, the latest development is a major five year
grant from the Scaler Foundation that will allow us to have
the entire 18th century text of Diderot's Encyclope'die ren-
dered into computer-readable form and added to the database.

This vast work will be an important addition to our
18th century corpus, as it is of interest to scholars of
literature, history and the social sciences. Furthermore,
we intend to provide access not only to the text of the
Encyclope'die, but also to the illustrations (planches) that
accompanied the text in the original edition. This project
will therefore provide an opportunity for ARTFL to take
advantage of new developments in the field of computer
graphics, exploring techniques in digitized images that may
also prove useful in preparing computerized versions of
other French texts with a strong graphic element.

The Encyclope'die project will get underway in the next
year, with selection of a supervisory board and preliminary
research into text encoding and image manipulation.

Database Usage Up

Use of the ARTFL database continued to grow in 1991.
The following tables show the number of logins per month
during 1990 and to September of 1991.
Month Philologic

Jan. 90 116
Feb. 90 133
Mar. 90 126
Apr. 90 274
May 90 195
June 90 148
July 90 126
Aug 90 130
Sep. 90 282
Oct. 90 244
Nov. 90 187
Dec. 90 112
Jan. 91 230
Feb. 91 168
Mar. 91 216
Apr. 91 288

December 5, 1991

- 8 -

May. 91 249
Jun. 91 204
Jul. 91 141
Aug. 91 194
Sep. 91 163

The majority of these users are accessing ARTFL via
Internet rather than long distance telephone connection. We
expect to see continuing increases in usage, as access to to
the database becomes easier in coming months. MOPS, the Mail
Order Philis Server, will provide an electronic mail inter-
face for scholars with BITNET accounts. The number of
academic institutions with computers providing Internet
access to humanities faculty and students continues to

ARTFL at the MLA

Mark Olsen, Assistant Director, will be at the Modern
Language Association to present a paper titled "What Can and
Cannot be Done with Electronic Text in Historical and
Literary Research" as part of the session "How We Do What We
Do: Modeling Literary Research by Computer" (Saturday,
December 28, 1:45-3:00, Potrero Hill-Telegraph Hill, Mar-
riott). Mark will also be happy to give ARTFL demonstra-
tions by appointment. Anyone interested in an ARTFL demons-
tration should contact Mark to make an appointment: by e-
mail to: mark@gide.uchicago.edu, or by phone at : 312-702-

The ARTFL Project
Department of Romance Languages
University of Chicago
1050 East 59th Street
Chicago, Illinois 60637
(312) 702-8488

electronic mail: artfl@artfl.uchicago.edu

December 5, 1991