Humanist Discussion Group, Vol. 14, No. 266. Centre for Computing in the Humanities, King's College London <http://www.princeton.edu/~mccarty/humanist/> <http://www.kcl.ac.uk/humanities/cch/humanist/> [1] From: "Jim Marchand" <marchand@ux1.cso.uiuc.edu> (90) Subject: Re: 14.0263 letter frequency in Latin? [2] From: Anne Mahoney <mahoa@bu.edu> (48) Subject: letter frequency in Latin --[1]------------------------------------------------------------------ Date: Mon, 25 Sep 2000 06:52:31 +0100 From: "Jim Marchand" <marchand@ux1.cso.uiuc.edu> Subject: Re: 14.0263 letter frequency in Latin? The question as to the frequency of letters in Latin is interesting and confronts us with a number of basic problems. These may seem trivial, but I can assure you they are not. First: What is a language and how can we delimit it? Language is one of those words like _is_ which we glibly use, but scarcely ever define. Secondly, what is Latin? Just looking at Olmsted's Index to Language 26-30 (LSA 1955): Latin, Latin, Archaic; Latin, British; Latin, Church; Latin, Classical; Latin, Colloquial; Latin, Early; Latin, Hispeeric; Latin, Imperial; Latin, Late; Latin, Low; Latin, Medieval; Latin, Neeo; Latin, Old; Latin, Patrtistic; Latin, Pauline; Latin, Renaissance; Latin, Republican; Latin, Vulgar, etc., and I have not been careful to list them all. Letters themselves offer numerous problems. How about diphthongs, often spelled, e.g. ae, as ligatures. The standard lists are in what we nowadays would call ASCII (restricted), so that German contains no umlauts, French no accents, etc. And what is the purpose of the list? There was at one time a great movement to discover the frequency of sounds in various languages, and George Zipf collected these in search of support for his law of least effort, etc. In fact, a glib answer to the question might be: Look at G. K. Zipf, he must list them somewhere. (for example: G. K. Zipf and F. M. Rogers, "Phonemes and variphones in four present-day Romance languages and classical Latin from the viewpoint of Dynamic Philology," Archives Nerlanddaises de Phontique Exprimentale 15 (1939), 111-147. One might, for example, take any large corpus and count the letters (many `concordance' programs [e.g. TACT, available for ca. $50 from the Modern Language Association] will do this for you). Or, one might take one of the concordances (or several of the concordances available), some of which list as lagniappe the letter frequencies of the corpus they are working with. This is not very `scientific', but will work well for sloppy work; after all, we all know that the sequence of the frequency of English letters is etaoinshrdlump, as Pogo assures us and Vanna White demonstrates each weekday night. My own count of Latin, made by running a text (the Five Books of Moses, j and i, v and u distinguished; ligatures expanded) of the Vugate through TACT, looks like this: e a i o t n l r s c m d p u v b g h f q z j x. I have, naturally, left out y and k. The question may not have an answer. In the Humanist archives is a thread on etaoin shrdlu, which you could retrieve by searching shrdlu. -----Original Message----- From: Humanist Discussion Group <willard.mccarty@kcl.ac.uk>) <willard@lists.village.virginia.edu> To: Humanist Discussion Group <humanist@lists.Princeton.EDU> Date: Friday, September 22, 2000 4:01 AM > > Humanist Discussion Group, Vol. 14, No. 263. > Centre for Computing in the Humanities, King's College London > <http://www.princeton.edu/~mccarty/humanist/> > <http://www.kcl.ac.uk/humanities/cch/humanist/> > > > > Date: Fri, 22 Sep 2000 09:45:24 +0100 > From: Melissa Terras <melslists@yahoo.com> > Subject: letter frequency in latin > >Hello All. > >A Question - I am looking for some (any) articles on >statistical analysis of letter frequency in Latin. I >know that there has been a lot of work done on letter >frequency and versatility in the English Language, but >does anyone know of any resources that deal with >letter frequency and propbable letter sequences in >Latin, from whatever period? > >Thanks! > >Melissa >________________________________________ >Melissa M Terras MA MSc >Engineering Science / Centre for the Study of Ancient >Documents >Christ Church >University of Oxford >Oxford 0X1 1DP > > > > >__________________________________________________ >Do You Yahoo!? >Send instant messages & get email alerts with Yahoo! Messenger. >http://im.yahoo.com/ > > > --[2]------------------------------------------------------------------ Date: Mon, 25 Sep 2000 06:53:23 +0100 From: Anne Mahoney <mahoa@bu.edu> Subject: letter frequency in Latin In a note to be published this year in Classical Outlook, my colleague Jeff Rydberg-Cox and I address this question. We counted the letters in the Perseus Latin corpus and found that the relative ranking of letters is not too different from that in English, except that 'i' and 'u' rank significantly higher than 'o' -- not surprising, given that they do double duty as consonants. The figures are as follows: letter percent (rounded) e 9.3 (727,785 occurrences) i 8.9 u 8.7 a 6.8 t 6.5 s 6.0 r 4.9 n 4.9 m 4.5 o 4.4 c 3.2 l 2.5 d 2.4 p 2.2 q 1.4 b 1.1 g 0.8 f 0.8 h 0.7 x 0.3 y 0.1 k 0 (434 occurrences) w 0 (322) z 0 (307) At the time there were no 'j' in the Perseus texts (though 'j' does occur in some of our schoolboy commentaries). The corpus is not consistent about 'u' and 'v', since we've retained whatever was in the original print editions, so we simply counted all 'v' as 'u'. We also did not attempt to weed out Roman numerals. The corpus we counted was about 7.8 million characters (letters, digits, and punctuation), from Plautus, Caesar (BG), Catullus, Cicero (orations and letters), Virgil, Horace (Odes), Livy (books 1-10), Ovid (Metamorphoses), Suetonius (Caesars), the Vulgate, and Servius's commentary on Virgil. Because this corpus is so heterogeneous, a lot more work could be done on refining the results. We did not look at letter sequences at all, and I don't think I've ever seen anything on that subject for Latin. --Anne Mahoney Perseus Project
This archive was generated by hypermail 2b30 : 09/25/00 EDT