Languages: Unicode Greek; English and Sanskrit

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 29 Jan 91 13:26:02 EST

Humanist Discussion Group, Vol. 4, No. 0953. Tuesday, 29 Jan 1991.

Date: Mon, 28 Jan 91 08:21:51 EST
From: elli@ikaros.harvard.edu (Elli Mylonas)
Subject: Unicode

Date: Sat, 26 Jan 91 17:53:25 -0500
From: green3@husc9.harvard.edu (Maria Green)
Subject: Re: 4.0947 ... English and Sanskrit

(1) --------------------------------------------------------------------
Date: Mon, 28 Jan 91 08:21:51 EST
From: elli@ikaros.harvard.edu (Elli Mylonas)
Subject: Unicode

I have just been looking at the Unicode character codes.

I was spurred on to this by Jim ODonnell's (sp?) posting, which reported
that the Greek characters were not well represented, which was
worrisome. However, although I cannot pass judgement on the idea of
Unicode, it does seem to me that most of the requirements of classical
Greek characters and accentuation are covered. What is not well
represented are all the brackets needed for epigraphical and
papyrological texts, nor the metrical markers.

In the part of the code set aside for Greek all the letters are present,
including upper and lower case monotonic accented vowels as single codes.
I assume that these are included for the purpose of conforming to the
Greek national standard. In addition, we find the single monotonic
accent, the diaeresis with the accent as a non-spacing character, and
the smooth and rough breathings. These may appear over any other
character. The Greek code space also includes upper and lower digamma,
sampi, koppa, and Byzantine sigma. There are also script forms of beta,
theta, phi, pi and kappa and a hook upsilon. Coptic forms are included
with the Greek.

The other accents necessary for classical Greek are not in the Greek code
space, but in the general diacritic space immediately preceding, The
acute, grave, circumflex, macron, breve, lower dot, diaeresis and upper
and lower slings are there. The Greek question mark and the upper and
lower numerical ticks are in the Greek code space.

Brackets are in the symbol code space, but include only parentheses,
square brackets, curlique brackets, angle brackets, and double brackets.
No half brackets or triple ones. Also, other metrical markings are
missing, such as the anceps and the long over shorts, etc. (This last
was to be expected.)

I have one question about accent marks. Non-spacing characters, in
Unicode, are to be input "...after the base character, and from the
center of the base character out." (p. 3). Does this mean that accent
characters like a rough and acute can be coded as:
character-rough-acute? or will this result in the acute being printed
over the rough? If the latter, then the paired accents and breathings
would need to exist as one diacritic. This is probably not a good idea,
since it increases the number of codes needed.

Otherwise, Unicode's letters and diacritics would fit something like TLG
encoding nicely, and even map well to beta code. At first glance,
Unicode seems to be able to handle a large number of characters fairly
well, and to have both the room and the willingness on the part of its
creators, to include some of the more arcane characters.

Have I missed anything?

(2) --------------------------------------------------------------39----
Date: Sat, 26 Jan 91 17:53:25 -0500
From: green3@husc9.harvard.edu (Maria Green)
Subject: Re: 4.0947 War, Rhetoric, and Protest

This is my first note to Humanist -- I only connected up a short time
ago, but I've enjoyed those discussions I've had the time to read

I would like to say a few words in reply to Bill Ball's note on Sanskrit
and English:

You don't mention which dictionary you are using, but I wonder if it may
not be misleading you somewhat. In fact, very few English words derive
directly from Sanskrit. (Even post-colonial loan words, by the way,
often are not from Sanskrit itself. "Thug" comes to us via Hindi, which
stands in the same relation to Sanskrit as Italian does to Latin.)

Sanskrit is, however, a member of that same Indo-European family of
languages that also includes Greek and Latin, among many others, and it
shares with them a very broad vocabulary base. Not infrequently, an
English word will be derived from some Greek or Latin term that hasn't
survived into any of the texts, but whose cognate cousin is attested in
Sanskrit. In these cases some dictionaries may mention the Sanskrit
cognate in the Etymology section, even though in fact it plays no role
in the development of the English word.

Philologically speaking, there is not much reason to assume intercourse
between Europe and India beyond what is already known. The shared
vocabulary, however, can serve to remind us that ultimately the
Europeans (or most of them) and the Indians (and in fact also the Slavs
and the Persians, and more) all derive from a single culture. In terms
of the broader lines of history we may not all be as distant from one
another as we sometimes tend to imagine.


Maria Green (green3@husc9.harvard.edu or green3@husc9.bitnet)
Dep't of Sanskrit and Indian Studies, Harvard University