From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>

Date: Mon, 16 Aug 2004 07:54:40 +0100

Date: Mon, 16 Aug 2004 07:54:40 +0100

Humanist Discussion Group, Vol. 18, No. 137.

Centre for Computing in the Humanities, King's College London

www.kcl.ac.uk/humanities/cch/humanist/

www.princeton.edu/humanist/

Submit to: humanist_at_princeton.edu

Date: Mon, 16 Aug 2004 07:42:33 +0100

From: "Yuri Tambovtsev" <yutamb_at_mail.cis.ru>

Subject: a book on phonostatistics and language typology

Dear HumanistList colleagues, may I ask you to be so kind as to send this

review either in your university colleagues, library or your friends who

might be interested in this book? Or some journal for publication if you

know a suitable journal? Looking forward to hearing from you soon to

<mailto:yutamb_at_hotmail.com>yutamb_at_hotmail.com Remain yours most hopefully

Yuri Tambovtsev, Novosibirsk, Russia

REVIEW ON THE BOOK BY TAMBOVTSEV, Yuri Alekseevich.

"TIPOLOGIA FUNCTSIONIROVANIA FONEM V ZVUKOVOI

TSEPOCHKE INDOEVROPEICKIKH, PALEOAZIATSKIKH, URALO-

ALTAICKIKH I DRUGIKH YAZIKOV MIRA: COMPAKTNOST '

PODRGUP, GRUP, SEMEI I DRUGIKH YAZIKOVIKH TAKSONOV"

["Typology of functioning of phonemes in a sound chain of Indo-

European, Palaeo-Asiatic, Ural-Altaic and other world languages:

compactness of subgroups, groups, families and other language taxons" -

Novosibirsk: Sibirskij Nezavisimyj Institut, 2003 - 143 pages.]

[Novosibirsk, 630123, Ul. Severnaya 23/1. Sibirskij Nezavissimyj

Institut].

Reviewed by Senior Teacher of Novosibirsk School #

180

Ludmila Alekseevna SHIPULINA

The book under review is the addition to Tambovtsev's theories, methods

and data published earlier (Tambovtsev. 1994-a; 1994-b; 2001-a; 2001-b;

2001-c). I think that linguistics needs new data to support or to reject the

classical theories. More often than not, linguists argue about this or that

linguistic theory (e.g. Uralic or Altaic language unities) without any new

data at hand. This new book by Yuri Tambovtsev provides such new data.

Speaking about applications of statistical methods in linguistics, one must

agree with Chris Butler that very often only statistical techniques are

relevant for some linguistic research because it is difficult otherwise to

understand the language phenomenon. It is especially important in any

type of linguistic study involving differences in people's linguistic

behaviour or in the patterns of language itself (Wray et al., 1998: 255).

Tambovtsev adds much data on phonological statistics of world languages.

He is one of the very few linguists who applied phonology to stylistics and

typology (Teshitelova, 1992: 157 - 181). In this book, as in the previous

books, Yuri Tambovtsev considers the typology of regulation and chaos of

distribution of consonant phonemes in a sound chain of world languages.

In fact, Tambovtsev concentrates on variability in sound chains of world

languages. Actually, he adds much to the essential parts of his theories and

methods in the analysed monograph under review, especially on the

phonostatistical universals of Finno-Ugric, Turkic, Indo-European ans

other world languages. The author examines the homogeneity of texts in

various languages from the point of view of the occurrence of phonemic

groups in their sound speech chains with the help of phonological

statistics. Tambovtsev also investigates the rules of a sound chain division,

as well as frequency of occurrence of certain phonemic groups of

consonants in the phonetic systems of various world languages. Many new

languages are investigated by his method, in comparison to his previous

books (Tambovtsev, 1994-a; 1994-b; 2001-a; 2001-b; 2001-c).

In fact, Yuri Tambovtsev has computed phonostatistical data on the

occurrence of labial, front (i.e. forelingual), palatal (mediolingual), back

(velar, pharengeal and glottal), sonorant, occlusive, fricative

(constrictive)

and voiced consonants in speech in a great number of languages. It

comprises 8 phonological features. The articulation system of these

languages is also discussed in brief. There is as well a short review of

ethnic history (ethnogenesis) of the nations speaking these languages. The

author thinks it of great importance to analyse these language contacts

during the history of their ethnic development.

As far I can judge, Tambovtsev's first article in the field of phonological

statistics was published in 1976. So, he has been working on the problems

mentioned above for a long time, i.e. for some 30 years. Unfortunately, I

cannot mention all Tambovtsev's publications since he is the author of 8

monographs and about 250 articles on language typology, phonostatistics

and phonetics. His study involves the sound pictures of 156 world

languages. In the book under review, Tambovtsev's conclusions are based

on the data of the occurrence of the frequency of phonemes in the

languages of the following families and groups:

1. Indo - European language family (the language groups: Indo - Aryan (8

languages), Iranian (4 languages) , Celtic (1 language), Italic (1 language),

Romanic (5 languages) , Germanic (7 languages) , Baltic (2 languages) ,

Slavonic (8 languages) , genetically isolated Indo-European languages (5

languages) , artificial languages(1).

2. Ural-Altaic language community which include the Uralic and Altaic

language communities:

A. Uralic language community, Finno-Ugric language family, Ugric

subgroup of Finno-Ugric language family (5 languages), Permic

subgroup of Finno-Ugric language family (2 languages) , Volgaic

subgroup of Finno-Ugric language family (5 languages) , Balto - Finnic

subgroup of Finno-Ugric language family (9 languages) , Samoyedic

language family (3 languages).

B. Altaic language community, Turkic language family (22 languages) ,

Mongol language family ( 3 languages).

3. Tungus - Manchurian language family (6 languages),

4. Yenisseyic language family (1 language).

5. Caucasian language family (2 languages).

6. Palaeo - Asiatic language family (8 languages).

7. Sino - Tibetan language family (2 languages).

8. Afro - Asiatic language family (3 languages).

9. Bantu language family (2).

10. Austro -Asiatic language family (2).

11. Austronesian language family (5 languages).

12. Australian language family (6 languages).

13. The language community of American Indians (20 languages).

As a linguist I often feel I must use statistical methods in my studies of

the English, German and other languages. However, it is hard for a linguist

to understand how to use them correctly, but at the same time in the easiest

simple way. The author of the book teaches us how to do it. He does it on

the example of the following methods of statistical calculation: standard

quadratic deviation, variation coefficient, level of significance, confidence

interval, T-criterion of Student, criterion of Kolmogorov-Smirnov, Chi-

square criterion, and Euclidean distance. He also shows how to measure

the statistical reliability of the linguistic results. Very often a

linguist, who

is a layman in linguistic statistics, may draw wrong linguistical results

because his results are not statistically reliable.

The book by Yuri Tambovtsev focuses not only on the mathematical

statistical methods, which have been employed by him in his linguistic

research, but also discusses the important problems of classification of

world languages. The author touches the topics of reliability of

mathematical statistical methods in linguistics. The target of his research is

to compare various languages within a single family as well as languages

belonging to different families and groups. For this sake, Tambovtsev has

generated mean values of frequency rates of various phonemes and

phonemic groups in speech. In fact, these mean values provide reliable

correlation between different languages. There are several mathematical

methods allowing estimations of variation of major statistical values.

Tambovtsev aims to estimate regularities in usage of particular phonemes

or phonemic groups in particular languages. He has chosen several

methods of variability estimation and described techniques of their

application to phonetic studies.

In this respect, the issues of a size of a sample are important. In fact, the

greater the sample, the more reliable results. One of the most important

problems is the problem of the size of the portions (units) into which the

text is divided. The portion should not be too small or too big. Tambovtsev

correctly takes the generally accepted sample portion in phonological

research, which is 1000 phonemes. Tambovtsev separates all his texts of

the languages under discussion into units comprising 1000 phonemes. In

statistics, the most reliable results are obtained on large samples. Thus,

Tambovtsev argues that the minimum necessary sample should include not

less than 30 thousand phonemes.

The author has applied the method of evaluation of the mean quadratic

deviation in his research among other methods estimating statistical

variations. The mean quadratic deviation index is used in generating other

evaluating indices. Quadratic deviation indices generated for two different

texts can be compared if the sample sizes of basic texts are equal. Standard

deviation data cannot be compared if the samples of texts are not equal. In

cases, when the sample sizes are different, other mathematical functions

should be used. Tambovtsev correctly chooses the estimation of the

confidence interval, "chi-square" criterion, coefficient of variance, etc.

In my opinion, it is important to provide the reader with the exact

examples of how to calculate the mean quadratic deviation or standard

deviation because a layman in phonostatistics, as myself, may do it in the

wrong way. Yuri Tambovtsev provides us with the data on the occurrence

of the labial consonants in the Old English texts: "Boewulf, Ohthere's and

Wulfstan's Story, the Description of Britain, Julius Caesar", etc. He

compares the use of labials in Old English to the analogical use in modern

English.

Variation coefficient represents another important tool in comparative

linguistic research. It helps to compare incommensurable values. As it was

stated above, the mean quadratic deviation characterises the degree of

deviation of the frequency rate of a particular phoneme from the mean

value. However, the mean quadratic deviation values do not take into

account the fact that the number of labial phonemes is greater that that of

the mid-lingual (palatal) phonemes. Consequently, the absolute mean

index of labial sounds is considerably greater than that of the palatal ones.

On the other hand, front-lingual phonemes are usually more frequent than

labial. This heterogeneity of features asks for additional methods of

comparison, i. e. the variation index called the "coefficient of variance".

Unlike the mean quadratic deviation, the coefficient of variation allows

correlation of frequency rates of those phonemes and phonemic groups,

which have produced different mean values. It is possible to make the

measure of variability comparable using the coefficient of variation. It can

be used in linguistics in the way it is recommended by Fred Fallik and

Bruce Brown for behavioural sciences (Fallik et al., 1983: 111 - 112). The

coefficient of variation is used as an indicator of variation/stability of

particular linguistic elements in a sample. The minimum necessary size of

such samples should be not less than 30 units. The larger is the value of

variation coefficient, the higher is the variability of a particular

pholological feature (phonemic frequency in this case).

Another important statistical notion is the significance level. In his

research Yuri Tambovtsev has chosen the significance level value of 0.05,

or 5%. To my mind, Tambovtsev chose it correctly since such a level of

significance is usually used by the majority of researchers in linguistics

and phonology. This sort of significance level (i.e. 5%) tells us that we

have 95% confidence in our linguistic research. This significance level. I

believe, is important in any linguistic research, but especially important for

correlations carried out on small samples, i.e. in the samples less than 30

thousand phonemes.

Confidence interval evaluation is closely related to other statistical

procedures like estimations of the minimum necessary sample at the fixed

significance level. Tambovtsev proposes to fix it always at 5%, for a

layman in statistics not to break his brain over the other possible levels.

Actually, it is so specific mathematical, that a linguist should not try to

understand its mathematical foundation. I'm sure, if a linguist learns how

operate with all necessary statistical criteria correctly, then using only one

level of significance (e.g. 5%) is quite all right. The higher level of

significance usually requires larger samples, and thus, much more labour,

than necessary.

In certain cases, I guess, one is advised to use the values of the

confidence

interval. The confidence interval evaluation is more reliable for

phonological research since it provides us with a greater precision. The

general rule is the narrower the confidence interval, the higher is the

homogeneity of a parameter under discussion, i.e. a frequency parameter

of a particular phonemic class or phoneme in speech. Usually, a text

allows us to obtain narrower confidence intervals than the collection of

phrases and words.

In his book, the author correctly provides a correlation between these

three important parameters: sample size and the confidence interval at the

fixed significance value. Available data have shown that the greater the

sample size, the lower is the confidence interval at the fixed significance

level in all languages of the world, irrespective of their genetic affiliation

or grammatical type.

Tambovtsev has also paid attention to reliability of statistical results

obtained in the course of his phonological research. He has received

indices representing statistical error resulting from the fact that each

sample represents only some portion of the general language aggregate.

Such indices are called representation errors. The value of the

representation error depends mostly on the sample size and on variation

rate of a particular parameter. It is noteworthy that texts in different

languages produce similar representation error, which does not depend on

their morphological structures. This fact suggests a certain universal in

consonant phonemic groups functioning in genetically different languages.

However, I think, that Tambovtsev has applied the strictest way of

estimating the representation error. On the one hand it is bad, since it

requires larger samples for a fixed error (e.g. the error of 5% or less), but,

on the other hand, it means that one can be surer of his linguistic result.

Yuri Tambovtsev rightly mentions that many linguists who use statistics

do not know that the T-test or "Student's" criterion was proposed by

William Gosset, and not by some scholar called Student. "Student" was the

name that William Gosset assumed as a pseudo-name. The Student's

criterion is employed in cases when it is necessary to compare two mean

values found for two different texts. The reliability of difference between

two mean values depends on variability of involved parameters and on the

sizes of the sample, for which these variables have been generated. The

"student's" criterion can be applied for variables subordinating to normal

dispersion. Within a sample of not less than 30 units, dispersion is

considered normal. In the course of research, the "student's" criterion has

been calculated for two samples of equal size of 31 thousand phonemes.

On the one hand, a scientific text was compared with fiction, and on the

other hand, two scientific texts were compared. The value the former is

nearly four times greater than the latter. It convinces us that the

"student's"

criterion can be applied for the stylistic analysis of texts all right.

The statistical criterion, called Kolmogorov-Smirnov test, provides

researchers with mathematical method of analysis, which does not depend

on the restrictions applied to statistical analyses. It concerns the following

conditions:

1) Statistical analyses are carried out with independent accidental

variables;

2) Aggregates of accidental variables should demonstrate close mean

and dispersion values;

3) Aggregates should subordinate to the law of normal

dispersion.

The Kolmogorov-Smirnov criterion belongs to the so-called "robust" non-

parameter methods, which are not sensitive of deviations from the standard

conditions. Low values of the Kolmogorov-Smirnov (K-S) criterion mean

that the fluctuation of the analysed linguistic parameters is minor, that is

not linguistically significant. Tambovtsev argues that the low value of K-S

criterion in his research supports his hypothesis on a normal dispersion of

the established eight groups of consonants within the speech sound chains.

Representation of any language with the help of eight groups of

consonants has served as a basis for his phono-statistical research.

Tambovtsev has also employed the "chi-square" criterion in his

investigations. With the aid of this criterion, he estimates differences

between the empirical and expected values. If the difference is

insignificant, it can be a result of accidental deviation. Otherwise, it

reflects significant differences between factitious (empirical) and expected

(theoretical) values of frequencies of phonemic group occurrences in

speech. L. Bolshev and N. Smirnov (Bolshev et al., 1983: 166 - 171) have

generated the list of maximum frequency values reflecting insignificant

fluctuations of variables through the "chi-square" technique, which

Tambovtsev provides on page 33. It is quite handy because usually

linguists do not have books on statistics at hand. Christopher Butler

recommends the chi-square test to measure the independence and

association of linguistic units in various sorts of linguistic material

(Butler,

1985: 118 - 126). Tambovtsev shows how to use it on the material of the

occurrence of labial consonants in British and American prose (Agatha

Christie, John Braine, W. S. Maugham, Jack London, F. Scott Fitzgerald,

Ernest Hemingway, etc.). The chi-square values show that labials are

distributed rather homogeniously. Tambovtsev draws the attention of the

reader to calculate the degrees of freedom correctly (p.30). He also

compares how similar is the distribution of labials, front, palatal, and velar

consonants in Kalmyk (a Mongolian language) and Japanese (a genetically

isolated language). It is not by this statistical criterion (p.31).

However, the

same criterion shows close similarity between the distribution of the 5

consonantal groups in Turkish and Uzbek (p.32). The T coefficient is less

than 1 in 5 parameters, i.e. front, palatal, velar, sonorant and occlusive.

Tambovtsev explains T coefficient as the ratio of the obtained values of

chi-square and the theoretical values which can be found in the chi-square

tables. It T coefficient is less than 1, the statistical results are

similar p.31 -

33). It also shows great similarity between some other Turkic, Finno-

Ugric, Samoyedic, Tungus-Manchurian, Slavonic, Germanic, Iranian and

other Indo-European languages inside their taxons.

Chapter 2 is dedicated to the issues of genetic and typological

classifications of languages of the world. The author does not go into

details and debates concerning inclusion of certain languages into

particular genetic groups and families, or identification of a particular

language as a separate language or a dialect. The major aim of the author

is to provide a technique, which would allow linguists to check the

rightfulness of inclusion of a particular language into a certain language

group or a family. Before analysing the compactness of subgroups, groups,

families and other language taxons, Tambovtsev warns the reader that the

problem of the division of world languages into families has not been

completely solved. For instance, it is quite necessary to discuss the

problem if Turkic languages constitute a family themselves or a branch in

some other family, called Altaic family. Actually, Turkic languages are

considered to form a family by some linguists (e.g. Baskakov, 1966 and

other Russian linguists). However, some other linguists, especially those in

the West, consider Turkic languages to be a group within the Altaic family

spoken in Asia Minor, Middle Asia and southern Asia (Crystal, 1992: 397;

Katzner, 1986:3). The other two branches of Altaic family are Tungus-

Manchurian and Mongolian. To my mind, it is more logical to consider

Turkic languages a family, rather than a subgroup within Altaic family.

Altaic languages should be called a super family, Sprachbund, language

community or unity, since the true genetic relationship of Turkic, Tungus-

Manchurian and Mongolian languages have not been proved. If one goes

along this line, then all languages on the Earth may be called one family

with lots of groups and branches. On the other hand, it is not productive to

form separate language family consisting of one language. For instance, in

1960s Ket was considered an isolated language of Paleo-Asiatic family

(Krejnovich, 1968: 453). However, now it is considered to form the so-

called Yeniseyan family, though consisting of only one language with its

dialects and subdialects. Summing up the modern point of view, David

Crystal remarks that Yeniseyan is a family of languages generally placed

within the Paleosiberian grouping, now represented by only one language -

Ket, or Yenisey-Ostyak (Crystal, 1992: 424). I don't think it is wise to

multiply language families like that. Other linguists (e.g. Ago Kunnap,

Angela Marcantonio, etc.) question the very existence of the Uralic

language family (Marcantonio, 2002).

Among other language families, Tambovtsev describes the Finno-Ugric

family. He argues, that this language family includes two major groups:

Baltic-Finnic and Ugric groups.

The author considers the theories of those linguists who identify the

following four groups in the Finno-Ugric family:

1) The Baltic-Finnic group including Estonian, Finnish, Karelian,

Vepsian, Izhorian, Vodian, Livonian, and Saami possessing some specific

features;

2) The Volga group including Erzia-Mordovian, Moksha-Mordovian,

Mountain Mari, and Lawn or Meadow East Mari;

3) The Permic group comprising Udmurdian, Komi-Zyrian, and Komi-

Permian;

4) The Ugric group comprising Hungarian, Manty, and Khansi.

Together with the Samoyedic language family comprising the Nenets,

Selkup, Nganasan, and Enets languages.

The Finno-Ugric and Samoyedic are said to form the Uralic language unit.

Tambovtsev argues that until present, no fore-language of this unit has

been established. The languages of the Uralic unit do not form a compact

unity from the point of view of dispersal and frequency of phonemic

groups. With the aid of the coefficients that have been received by

Tambovtsev in his studies, the author has shown that the consonant indices

and the compactness (dispersion) coefficients suggest a more compact

unity for Samoyedic languages family (the meanV=18.29%; T=0.16),

rather than for the Finno-Ugric (the mean V=24.14%; T=0.47). The Uralic

language unity has a greater dispersion (the mean V=28.31%; T=0.57).

This fact has been interpreted as a support of the idea that languages of the

Samoedic and Finno-Ugric family are more closely related to one another

within the family, than between the families. Thus, the idea of the Uralic

taxon as a language family should be either rejected or considered with

caution (p.125).

The Turkic language group includes Azeri, Baraba-Tatar, Bashkir,

Gagauz, Karaim, Dolgan, Kazakh, Kamasin, Karakalpak, Karachai-

Balkarian, Kyrgyz, Crimea-Tatar, Kumyk, Nogai, Tatar, Tofalar, Tuvin,

Turkish, Turkmenian, Uzbek, Shor, and Yakut. The author argues that a

Turkic fore-language can be regarded as a real basic language for all the

Turkic languages. He points out that the Turkic fore-language (Ursprache)

demonstrates closer relations to any of the present Turkic languages, than

these languages may have between one another now. However, he did not

include the Ancient Turkic into his studies because of the uncertainty in

the pronunciation.

The Mongolian language family includes only three languages: Buriat,

Kalmyk, and Mongolian. It is the minimum possible group for statistical

analysis.

The Tungus-Manchurian language group includes 10 languages:

Manchurian, Nanai, Negidal, Oroch, Orok, Solon, Udege, Ulchi, Evenk

(Tungus), and Even.

Inclusion of the Turkic, Mongolian and Tungus-Manchurian language

family into one language unity represents the debatable topic in linguistics

to day.

The Indo-European language family seems to be the most thoroughly

investigated. Major linguistic methods of investigations and comparative

linguistic analysis were elaborated during the long history of studies of

European languages. However, currently the major question concerning

the existence of a single Indo-European fore-language has not been

resolved.

It is noteworthy, that many linguistic debates have been often carried out

in terms of "similarity" and "linguistic distance". Yet, the terms themselves

have not been clearly defined yet.

Tambovtsev thinks that at the present state of understanding, modern

languages represent either products of divergence or the reverse process,

i.e. convergence. In historical perspective, both processes produced their

impacts on development of languages. Tambovtsev agrees with those

researchers who think that origin of all Indo-European languages from a

single fore-language is fiction, while their co-existence and convergence in

their development resulting in appearance of certain common features is a

scientific fact. The noted uniformity of the Indo-European languages can

be explained as a secondary, later phenomenon, and differentiating

features represent the original and early characteristics of each language of

this family.

However, no classifications other than the genealogic one have been

elaborated, Tambovtsev accepts the following classification of the Indo-

European family: the Indian, the Iranian, the Baltic, the Slavonic

(including Eastern, Western, and Southern Slavonic sub-groups),

Germanic, Romanic, and Celtic language groups.

Following Illich-Svitych, Tambovtsev believes that the Nostratic language

unity can serve as a good model for linguistic investigations of various

sorts, but he does not think these languages should be considered a

language unity; moreover, this rather arbitrary construct is not recognised

by all the linguists. The Nostratic language unity includes the following

language families: Indo-European, Finno-Ugrian, Samoyedic, Turkic,

Mongolian, Tungus-Manchurian, Cartvelian, and Semito-Hamitian.

Tambovtsev proposes a concept of compactness for linguistic studies. He

defines compactness as more or less closely related languages within

language sub-groups, groups, families, etc. In other words, he attempts to

measure the distance between languages within analysed taxons or

clusters. The distances are measured on the basis of frequency rates of

particular linguistic (phonological) characteristics.

The author uses the concepts of image recognition and regards language

families as a unit with more of less compact structure. In the branch of

applied mathematics called pattern recognition different images of various

sorts are recognised. One can consider language to be a sort of such image.

Therefore, one can use the methods of pattern recognition to develop

various types of classifications based on exact values of some coefficients

(Zagorujko, 1999: 195 - 201). The generated index of compactness can be

regarded as an indicator of an opposing process of diffusion. Values of

frequency rate of particular parameter should not considerably deviate

from the mean value established for a given language family or group. If

the values of deviation are considerably greater than the established mean

value, the given language does not belong to the language family under

discussion. If majority of languages produce these deviation indices higher

than the mean value, we should state that the languages under study do not

form a language group but rather a set of separate languages.

Tambovtsev has forwarded his hypothesis that typological similarity of

languages can be tested by statistical methods resulting in generation a set

of indices described above. The hypothesis holds that when a language is

included into a particular language group, the generated indices of this new

formation will show either a

higher or lower compactness. Closely related language would increase the

compactness indices and vice versa.

The author illustrates this presupposition by a series of examples. Thus, he

analyses frequency rates of labial consonants in the Turkic languages

compared to Mongolian. The frequency of labial consonants in Mongolian

is 7.52%. In the Turkic languages the relevant figures vary from 5.98% to

12.80%. The total fluctuation index is 6.28, the difference between the

neighboring languages is 0.49. The Altai language has produced the lowest

index of labial consonant frequency, while the Karakalpakian has shown

the highest index. The Turkic languages can be classified in the following

way by the labial consonant frequency indices: Karakalpakian - 12.80%;

Turkish - 10.41%; Uigur - 9.83%; Azerbajanian - 9.66%; Uzbekian -

9.42%; Kumandinian - 9.22%; Baraba-Tatarian - 9.04%; Turkmenian -

8.50%; Kirgizian - 8.43%; Kazakn-Tatarian - 8.03%; Kazakhian - 7.99%;

Khakassian - 7.82%; Yakutian - 6.10%, and Altaian - 5.98%. The place of

the Mongolian language (7.52%) is between Khakassian and Yakutian

suggesting the distribution of labial consonants is more similar in these

three languages compared to other languages of the Turkic group.

The Mongolian group has produced the following indices: Mongolian

(7.52%), Buriatian (7.67%), and Kalmykian (6.65%). This distribution

indices fall within the same range as above - from 5.98% to 12.80%, while

the total fluctuation and the difference between the neighboring languages

are lower (1.02 and 0.34 respectively).

The Uralian language unity yields the labial frequency indices in the range

of 7.71% - 13.72%, the difference between the neighboring languages is

0.30. Indices of language group compounding Mongolian and Tungus-

Manchu languages are from 7.52% to 12.46%, with the mean difference

between the neighboring values of 0.70. Consequently, we may infer on

considerable differences in the sound chains of the Mongolian and the

Tungus-Manchurian languages.

On the contrary, introduction of the Mansi language belonging to the

Finno-Ugrian language family, on which language Turkic and Mongolian

languages did not produced considerable influence, into the Turkic

languages increases the diffusion index of this group. Consequently, the

Mansi language, unlike Mongolian, does not belong to the Turkic language

group.

Analysis of frequency rates of the front (i.e. forelingual) consonants may

serve as another example of compactness of Turkic and Mongolian

languages. Front-lingual consonants represent the most frequent sounds in

the Turkic languages as well as in many other languages of the world. The

range of frequency of front-lingual sounds in the Turkic languages varies

from 32.35% to 40.24%. The overall fluctuation index is 7.89, the

difference between the neighboring languages (the mean difference) is

0.564. In Mongolian, the range of frequency of front-lingual sounds is

36.57%of the total number of sounds. The mean difference for a

compound group of Turkic languages and Mongolian becomes lower

(0.526). The relevant figures found for the UraliĀ languages are: frequency

range 24.79% - 36.78%; the fluctuation index is 11.99; the mean

difference is 0.6. Apparently, the Turkic language group is more compact

than the Uralic.

The Mongolian and Tungus-Manchu language families have yielded

similar indices in the range of 17.31% to 36.57%; the fluctuation index is

19.26; the mean difference is 2.75.The Paleo-Asian group of languages

represent still less compact group, their frequency rates varying from

20.02% to 36,74%; the fluctuation index is 16.64; the mean difference is

2.38.

The author provides frequency indices on many languages and language

groups. In order to show the general tendency in the distribution of speech

sounds he proposes to use the general coefficients of variation resulting

from adding generated indices on each group of phonemes. He also uses

the T coefficient, which is generated on the basis of "chi-square" index, as

a reference index. The resulting general coefficients of variation (V) allow

him to form the following sequence. The Ugric language group

demonstrates the highest diffusion (V = 221.27%, T = 3,77). The Baltic-

Finnish languages yield V = 185.90%, T=2,79). The group of Volga

languages is the most compact group with V =143, 19, T=1.02).

Another interesting method of comparative analysis implies introduction

of isolates Asian languages into various language families in order to

establish possible relationships. Thus, introduction of the Ket language

into the Finnish-Ugric family (V = 193.13%, T = 3.77) results in the

higher diffusion (V =198.04, T = 3.94). The same procedure with

Yukaghir yields V = 199.17%; with Korean V is 199.24%, T = 3.88; with

Japanese V is 200.51%, T = 3.91; Nivkhi yields V = 206.48%. On the

contrary, Chinese has shown closer similarity with the Finno-Ugric

languages: V = 190.01%, T = 3.65.

As a result of his investigations, Tambovtsev has come to the following

conclusions:

1) Front (forelingual) and occlusive consonants are most evenly

distributed within language families.

2) Voiced consonants represent the most variable feature; some

languages have no category called "voiced" consonants.

3) The Mongolian language family is the most compact by the total

sum of the values of the coefficient of variation based on seven major

groups of phonemes (without voiced consonants) and the coefficient T.

The consequence with respect to total sum of the coefficient of variation

has been established as follows: the Mongolic, the Samoyedic, the Turkic,

the Tungus-Manchurian, and Finno-Ugric language families. The Paleo-

Asiatic language family has yielded the highest diffusion (i.e. the lowest

compactness) indices and consequently can be regarded not as a language

family but as a loose language unity or community.

4) The general tendency has been shown that in general a language

sub-group is more compact that a group, and a group is more compact that

a language family. The least compact, that is the most loose, is the

language super-unity comprising all the languages of the world.

5) A collection of two language groups or two families into one

unit results in a higher diffusion characteristics than the original taxons.

All I can say is that the book by Yuri Tambovtsev is a solid and profound

investigation in the comparative analysis of the languages of the world.

The author provides many tables with indices and coefficients generated

through various techniques for a great number of languages. Analysis of

these data provides linguists with a method of linguistic investigations on

the basis of numerical procedures. The book contains a large list of

references. It is recommended to those students, who are interested in

phonology, linguistical statistics and typology of world languages. I guess

that at the moment, many linguists are dealing with minor linguistic

problems in one language. Linguistics lacks such books, which deal with

the modern classification of world languages. Tambovtsev's book may

give the new material for such language classifications.

Being a linguist by education, I naturally was scared to discuss

statistics

methods without the consultation of the specialists in mathematical

statistics. I must thank for consultations and generous advice Prof. Dr.

Arkadiy Shemiakin, Prof. Dr. Vadim Efimov, Prof. Dr. Leonid Frumin and

Prof. Dr. Valeriy Yudin.

References

Bolshev et al., 1983 - Bolshev, Login Nikolaevich and Nikolai Vasilyevich

Smirnov. Tables of Mathemetical Statistics. - Moskva: Nauka, 1983. - 416

pages. (in Russian).

Butler, 1985 - Butler, Christopher. Statistics in Linguistics. - Oxford:

Basil Blackwell, 1985. - 214 pages.

Fallik et al., 1983 - Fallik, Fred and Bruce Brown. Statistics for Behavioral

Sciences. - Homewood, Illinois: The Dorsey Press, 1983. - 538 pages.

Marcantonio, 2002 - Marcantonio, Angela. The Uralic Language Fimily:

Myths and Statistics. - Oxford: Blackwell Publishers, 2002. - 335 pages.

Tambovtsev, 1994 -a - Tambovtsev, Yuri. Dinamika funktsionirovanija

fonem v zvukovyh tsepochkah jazykov razlichnogo stroja. [Dynamics of

functioning of phonemes in the languages of different structure]. -

Novosibirsk: Novosibirsk University Press, 1994-a. - 133 pages.

Tambovtsev, 1994-b - Tambovtsev, Yuri. Tipologija uporjadochennosti

zvukovyh tsepej v jazyke. [Typology of Oderliness of Sound Chains in

Language]. - Novosibirsk: Novosibirsk University Press, 1994-b. - 199

pages.

Tambovtsev, 2001-a - Tambovtsev, Yuri. Kompendium osnovnyh

statisticheskih harakteristik funktsionirovanija soglasnyh fonem v

zvukovoj tsepochke anglijskogo, nemetskogo, frantsuzkogo i drugih

indoevropejskih jazykov. [A compendium of the major statistical

characteristics within the paradigm of consonant phonemes functioning in

the sound chains of the English, German, French, and other Indo-European

languages.] - Novosibirsk: Novosibirsk Classical Institute, Novosibirsk,

2001. - 129 pages.

Tambovtsev, 2001-c - Tambovtsev, Yuri. Nekotorye teoreticheskie

polozhenia tipologii uporiadochennosti fonem v zvukovoi tzepochke

yazyka i kompendium statisticheskikh kharakteristik osnovnykh grupp

soglasnykh fonem. [Theoretical concepts of typology of the order of

phonemes in language sound chains and a compendium of statistical

characteristics of the main groups of consonant phonemes]. -

Novosibirsk: Novosibirsk Classical Institute, 2001. - 130 pages.

Tambovtsev, 2003 - Lingvisticheskaja taksonomija: kompaktnost'

jazykovyh podgrupp, grupp i semej. [Linguistical taxonomy: coppactness

of language subgruops, groups and families]. - In: Baltistika, Volume 37, #

1, (Vilnius), 2003, p. 131 - 161.

Teshitelova, 1992 - Teshitelova, Marie. Quantitative Linguistics. -

Amsterdam/Philadelphia: John Benjamins publishing company, 1992. -

253 pages.

Wray et al., 1998 - Wray, Alison; Trott, Kate and Aileen Bloomer with

Shirley Reay and Chris Butler. Projects in Linguistics: A Practical Guide

to Researching Language. - London and New York: Arnold, 1998. - 303

pages.

Zagorujko, 1991 - Zagorujko, Nikolaj Grigorjevich. Applied Methods of

Data and Knowledge Analysis [in Russian]. - Novosibirsk: Institute of

Mathematics of the Siberian Branch of the Russian Academy, 1999. - 268

pages.

Reviewed by Ludmila Alekseevna Shipulina

Received on Mon Aug 16 2004 - 03:59:18 EDT

*
This archive was generated by hypermail 2.2.0
: Mon Aug 16 2004 - 03:59:19 EDT
*