9.366 announcements

Humanist (mccarty@phoenix.Princeton.EDU)
Sat, 9 Dec 1995 01:33:06 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 366.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)

[1] From: Richard Bear <RBEAR@OREGON.UOREGON.EDU> (4)
Subject: Beggars Opera

[2] From: celex@mpi.nl (163)

[3] From: mark@barkov.uchicago.edu (40)
Subject: Image of France

[4] From: James O'Donnell <jod@ccat.sas.upenn.edu> (9)
Subject: renovatio ordinis

[5] From: "Theodore F. Brunner" <tbrunner@uci.edu> (11)
Subject: TLG Web Page

Date: Thu, 07 Dec 1995 13:23:43 -0800 (PST)
Subject: Beggars Opera

The etext of Gay's Beggar's Opera has been updated to html with linked


Richard Bear

Date: Thu, 7 Dec 1995 14:26:27 -0500
From: celex@mpi.nl

Announcing a
NEW RELEASE from the
and the

This message announces the Second Release of the CELEX CD-ROM with
lexical data from the Dutch Centre for Lexical Information and the
Linguistic Data Consortium.

This CD-ROM contains an enhanced, expanded version of the German
lexical database (2.5), featuring approximately 1000 new lemma
entries, revised morphological parses, verb argument structures,
inflectional paradigm codes, and a corpus type lexicon. A complete
PostScript version of the German Linguistic Guide is also included, in
both European A4-format and American Letter format. For German, the
total number of lemmas included is now 51,728, while all their
inflected forms number 365,530.

Moreover, phonetic syllable frequencies have been added for (British)
English and Dutch. Apart from this, and the provision of frequency
information alongside every lexical feature, no changes have been made
to the Dutch and English lexicons.

Complete AWK-scripts are now provided to compute representations not
found in the (plain ASCII) lexical data files, corresponding to the
features described in the CELEX User Guide, which is included on the
CD as well.

For each language, i.e. English, German and Dutch, the CD-ROM contains
detailed information on the orthography (variations in spelling,
hyphenation), the phonology (phonetic transcriptions, variations in
pronunciation, syllable structure, primary stress), the morphology
(derivational and compositional structure, inflectional paradigms),
the syntax (word class, word-class specific subcategorisations,
argument structures), and word frequency (summed word and lemma
counts, based on recent and representative text corpora) of both
wordforms and lemmas. Unique identity numbers allow the linking of
information from different files with the aid of an efficient,
index-based C-program.

Like its predecessor, the CD-ROM is mastered using the ISO 9660 data
format, with the Rock Ridge extensions, allowing it to be used in VMS,
MS-DOS, Macintosh and UNIX environments. As the new release does not
omit any data from the first edition, the current release will replace
the old one.

Institutions that have membership in the LDC during the 1995 or 1996
Membership Years will be able to receive CELEX for research purposes
only at no additional charge, in the same manner as all other text and
speech corpora published by the LDC.

Non-members can receive a copy of CELEX for research purposes only for
a fee of $150. If you would like to order a copy of this corpus,
please email your request to ldc@unagi.cis.upenn.edu, or fax it to
(215) 573-2175. If you need additional information before placing your
order, or would like to inquire about membership in the LDC, please
send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.cis.upenn.edu/~ldc. More information specific to CELEX can
be accessed via hyperlinks from this Home Page. Information is also
available via ftp at ftp.cis.upenn.edu under pub/ldc; for ftp access,
please use "anonymous" as your login name, and give your email address
when asked for password.

A brief overview of the revised German data on the CD is given below:



When starting to use the German database, the user first has to choose
between three so-called `lexicon types':

- a lemma lexicon
- a wordform lexicon
- a corpus type lexicon

Each lexicon type uses a specific kind of entry. The CELEX lemma
lexicon is the one most similar to an ordinary dictionary since every
entry in this lexicon represents a set of related inflected words. In
a lexicon, a lemma can be represented by using a headword (cf.
traditional dictionary entries) such as, for example, `helfen' (help)
or `Hund' (dog), or by a stem such as, for example, 'helf' or 'Hund'.
The wordform lexicon yields all possible inflected words: every entry
in the lexicon is an inflectional variant of the related headword or
stem. So, a wordform lexicon contains words like `helfe', `hilft',
`geholfen', `huelfe', `Hundes', `Hunde' and so on. A corpus type
lexicon, on the other hand, simply gives you an ordered list of all
alphanumeric strings found in the corpus with raw string counts,
undisambiguated for relations to either lemmas or wordforms.

For all types of lexicons, the user may subsequently select any number
of columns -- from approximately 200 database columns -- combining
information on the orthography, phonology, morphology, syntax and
frequency of the entries.


The lexical data that can be selected for each entry in the different
German lexicon types can be divided into five categories: orthography,
phonology, morphology, syntax and frequency.

Orthography - with or without diacritics
(spelling) - with or without word division positions
- number of letters/syllables

Phonology - phonetic transcriptions which use different notations
(pronunciation) like SAMPA or CPA and include:
- syllable boundaries
- primary stress markers
- consonant-vowel patterns
- number of phonemes/syllables

Morphology - Derivational/compositional:
(word structure) - division into stems and affixes
- flat or hierarchical representations
- Inflectional:
- stems and their inflections

Syntax - word class
(grammar) - subcategorisations per word class

Frequency - Mannheim frequency(*)
(*) These frequency data are based on the 6 million word corpus
compiled by the Institut fuer Deutsche Sprache in Mannheim, Germany.


An arbitrary query using a small German lemma lexicon (that is, one
with very few columns) might yield the following result:

Headword Pronunciation Morphology: M: Cl Freq
Structured Segmentation Cl
----------- ---------------- ------------------------ --- -- ----
helfen "hEl-f@n (helf) V V 1225
Helfer "hEl-f@r ((helf),(er)) Vx N 134
hellaeugig "hEl-Oy-gIx ((hell),(Auge),(ig)) ANx A 0
hellblau "hEl-blau ((hell),(blau)) AA A 28
Hellseher "hEl-ze:-@r (((hell),(seh)),(er)) AVx N 20
hellseherisch "hEl-ze:-@-rIS (((hell),(seh)),(erisch)) AVx A 0
hellwach "hEl-vax ((hell),(((wach),(e)))) AVx A 13
Helm "hElm (Helm) N N 22
Hund "hUnt (Hund) N N 364
Huendchen "hYnt-x@n ((Hund),(chen)) Nx N 7
hundekalt "hUn-d@-kalt ((Hund),(e),(kalt)) NxA A 0
hundemuede "hUn-d@-my:-d@ ((Hund),(e),(muede)) NxA A 3
Hundeschnauze "hUn-d@-Snau-ts@ ((Hund),(e),(Schnauze)) NxN N 1
Hundesteuer "hUn-d@-StOy-@r ((Hund),(e),(Steuer)) NxN N 6
Hundewetter "hUn-d@-vE-t@r ((Hund),(e),(Wetter)) NxN N 0
Huendin "hYn-dIn ((Hund),(in)) Nx N 7
huendisch "hYn-dIS ((Hund),(isch)) Nx A 2
Huene "hy:-n@ (Huene) N N 13
huenenhaft "hy:-n@n-haft ((Huene),(n),(haft)) Nxx A 4
Hunger "hU-N@r (Hunger) N N 102
Hungerkur "hU-N@r-ku:r ((Hunger),(Kur)) NN N 5
Hungerlohn "hU-N@r-lo:n ((Hunger),(Lohn)) NN N 6
hungern "hU-N@rn ((Hunger)) N V 33
Hungersnot "hU-N@rs-no:t ((Hunger),(s),(Not)) NxN N 23
Hungerstreik "hU-N@r-Straik ((Hunger),((streik))) NV N 14

Richard Piepenbrock
CELEX Project Manager

-- C E L E X --
-- The Centre for Lexical Information -- C
Max Planck Institute for Psycholinguistics C CCCCCC
The Netherlands CCCCCCCCCC CC
Tel: (+31) (0)24 - 3615797 CCCCCCCC
Fax: (+31) (0)24 - 3521213 CCCCCCCC
E-mail: celex@mpi.nl CCCCCCCC
WWW-page: http://www.kun.nl/celex/ CCCCCCCC

Date: Thu, 7 Dec 1995 15:08:01 -0500
From: mark@barkov.uchicago.edu
Subject: Announcement: Image of France

The Image of France


The Image of France Project is a digital transcription and indexing
of the listings of all printed imagery--engravings, lithographs,
woodcuts, etc.--authorized for publication in France, beginning in
1811 and proceeding through much of the remainder of the 19th

The project has converted about 6000 records comprising all of the
listings for the years 1811-1817, as recorded in a special section
of the Bibliographie de la France. The listings may be searched by
key word (stems and phrases) as well as by the personal names of
artistic contributors and publishers in boolean combinations.

Filed under force of law by either the publisher or artist, the
listings note along with a print's subject and principal artists,
the precise date after which its diffusion was legal and the agent
and address of distribution. Each filing was also accompanied by
several copies of the print itself; and, according to regulation,
at least one of these was deposited at the Dept. de l'Estampe of
the Bibliothe,=8Aque nationale de France, where it would await
researchers today.

In lieu of cataloging of this great collection, the Image of
France proposes a form of historical bibliography of printed imagery
which is much more extensive and much more inclusive of all varieties
of work than any seriously attempted before now.

The project has been undertaken by George D. McKee, in consultation
with the Dept. de l'Estampe of the Bibliothe`que nationale and with
support of the Binghamton University Libraries and the Joint Labor
Committee of New York State and United University Professions. Comments
are welcome: gmckee@library.lib.binghamton.edu

George D. McKee
phone 607 777-4903

The WWW implementation of the Image of France is a collaboration
between George McKee and the ARTFL Project, University of Chicago.

Mark Olsen

Date: Fri, 8 Dec 1995 09:24:49 -0500 (EST)
From: James O'Donnell <jod@ccat.sas.upenn.edu>
Subject: renovatio ordinis

There are those that say (Willard? Geoff Nunberg? me? I keep
forgetting) that the new e-text world shares a lot in common with the
medieval manuscript culture. So now, the Monastery of Christ in the
Desert in Abiquiu NM has its web page offering, among other things, the
services of scriptorium@christdesert.org for those who want help in
designing imaginative and artistic web pages. See

Jim O'Donnell
Classics, U. of Penn

Date: Fri, 8 Dec 1995 12:42:51 -0800
From: "Theodore F. Brunner" <tbrunner@uci.edu>
Subject: TLG Web Page

The TLG's new web page can be accessed at http://www.tlg.uci.edu/~tlg. We
invite suggestions for further information which we might provide in order
to assist TLG users.

All the best for the coming holidays from all of us at the
Thesaurus Linguae Graecae!

Ted Brunner

Theodore F. Brunner, Director
Thesaurus Linguae Graecae
Phone: (714) 824-64904
FAX: (714) 824-8434