Humanist Discussion Group, Vol. 14, No. 266.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>
[1] From: "Jim Marchand" <marchand@ux1.cso.uiuc.edu> (90)
Subject: Re: 14.0263 letter frequency in Latin?
[2] From: Anne Mahoney <mahoa@bu.edu> (48)
Subject: letter frequency in Latin
--[1]------------------------------------------------------------------
Date: Mon, 25 Sep 2000 06:52:31 +0100
From: "Jim Marchand" <marchand@ux1.cso.uiuc.edu>
Subject: Re: 14.0263 letter frequency in Latin?
The question as to the frequency of letters in Latin is interesting
and confronts us with a number of basic problems. These may seem trivial,
but I can assure you they are not. First: What is a
language and how can we delimit it? Language is one of those words like
_is_ which we glibly use, but scarcely ever define. Secondly, what is
Latin? Just looking at Olmsted's Index to Language 26-30 (LSA 1955): Latin,
Latin, Archaic; Latin, British; Latin, Church; Latin, Classical; Latin,
Colloquial; Latin, Early; Latin,
Hispeeric; Latin, Imperial; Latin, Late; Latin, Low; Latin,
Medieval; Latin, Neeo; Latin, Old; Latin, Patrtistic; Latin,
Pauline; Latin, Renaissance; Latin, Republican; Latin, Vulgar,
etc., and I have not been careful to list them all. Letters
themselves offer numerous problems. How about diphthongs, often
spelled, e.g. ae, as ligatures. The standard lists are in what we
nowadays would call ASCII (restricted), so that German contains no
umlauts, French no accents, etc. And what is the purpose of the
list? There was at one time a great movement to discover the
frequency of sounds in various languages, and George Zipf collected these in
search of support for his law of least effort, etc. In
fact, a glib answer to the question might be: Look at G. K. Zipf,
he must list them somewhere. (for example: G. K. Zipf and F. M.
Rogers, "Phonemes and variphones in four present-day Romance
languages and classical Latin from the viewpoint of Dynamic
Philology," Archives Nerlanddaises de Phontique Exprimentale 15
(1939), 111-147.
One might, for example, take any large corpus and count the
letters (many `concordance' programs [e.g. TACT, available for ca.
$50 from the Modern Language Association] will do this for you).
Or, one might take one of the concordances (or several of the
concordances available), some of which list as lagniappe the letter
frequencies of the corpus they are working with. This is not very
`scientific', but will work well for sloppy work; after all, we all
know that the sequence of the frequency of English letters is
etaoinshrdlump, as Pogo assures us and Vanna White demonstrates each weekday
night.
My own count of Latin, made by running a text (the Five Books of
Moses, j and i, v and u distinguished; ligatures expanded) of the
Vugate through TACT, looks like this: e a i o t n l r s c m d p u
v b g h f q z j x. I have, naturally, left out y and k.
The question may not have an answer.
In the Humanist archives is a thread on etaoin shrdlu, which you
could retrieve by searching shrdlu.
-----Original Message-----
From: Humanist Discussion Group
<willard.mccarty@kcl.ac.uk>) <willard@lists.village.virginia.edu>
To: Humanist Discussion Group <humanist@lists.Princeton.EDU>
Date: Friday, September 22, 2000 4:01 AM
>
> Humanist Discussion Group, Vol. 14, No. 263.
> Centre for Computing in the Humanities, King's College London
> <http://www.princeton.edu/~mccarty/humanist/>
> <http://www.kcl.ac.uk/humanities/cch/humanist/>
>
>
>
> Date: Fri, 22 Sep 2000 09:45:24 +0100
> From: Melissa Terras <melslists@yahoo.com>
> Subject: letter frequency in latin
>
>Hello All.
>
>A Question - I am looking for some (any) articles on
>statistical analysis of letter frequency in Latin. I
>know that there has been a lot of work done on letter
>frequency and versatility in the English Language, but
>does anyone know of any resources that deal with
>letter frequency and propbable letter sequences in
>Latin, from whatever period?
>
>Thanks!
>
>Melissa
>________________________________________
>Melissa M Terras MA MSc
>Engineering Science / Centre for the Study of Ancient
>Documents
>Christ Church
>University of Oxford
>Oxford 0X1 1DP
>
>
>
>
>__________________________________________________
>Do You Yahoo!?
>Send instant messages & get email alerts with Yahoo! Messenger.
>http://im.yahoo.com/
>
>
>
--[2]------------------------------------------------------------------
Date: Mon, 25 Sep 2000 06:53:23 +0100
From: Anne Mahoney <mahoa@bu.edu>
Subject: letter frequency in Latin
In a note to be published this year in Classical Outlook, my colleague Jeff
Rydberg-Cox and I address this question. We counted the letters in the Perseus
Latin corpus and found that the relative ranking of letters is not too
different
from that in English, except that 'i' and 'u' rank significantly higher
than 'o'
-- not surprising, given that they do double duty as consonants.
The figures are as follows:
letter percent (rounded)
e 9.3 (727,785 occurrences)
i 8.9
u 8.7
a 6.8
t 6.5
s 6.0
r 4.9
n 4.9
m 4.5
o 4.4
c 3.2
l 2.5
d 2.4
p 2.2
q 1.4
b 1.1
g 0.8
f 0.8
h 0.7
x 0.3
y 0.1
k 0 (434 occurrences)
w 0 (322)
z 0 (307)
At the time there were no 'j' in the Perseus texts (though 'j' does occur in
some of our schoolboy commentaries). The corpus is not consistent about
'u' and
'v', since we've retained whatever was in the original print editions, so we
simply counted all 'v' as 'u'. We also did not attempt to weed out Roman
numerals.
The corpus we counted was about 7.8 million characters (letters, digits, and
punctuation), from Plautus, Caesar (BG), Catullus, Cicero (orations and
letters), Virgil, Horace (Odes), Livy (books 1-10), Ovid (Metamorphoses),
Suetonius (Caesars), the Vulgate, and Servius's commentary on Virgil. Because
this corpus is so heterogeneous, a lot more work could be done on refining the
results.
We did not look at letter sequences at all, and I don't think I've ever seen
anything on that subject for Latin.
--Anne Mahoney
Perseus Project
This archive was generated by hypermail 2b30 : 09/25/00 EDT