3.38 coding Sanskrit, cont. (86)

Tue, 16 May 89 21:05:44 EDT

Humanist Discussion Group, Vol. 3, No. 38. Tuesday, 16 May 1989.

Date: Tue, 16 May 89 17:02
From: Wujastyk (on GEC 4190 Rim-C at UCL) <UCGADKW@EUCLID.UCL.AC.UK>
Subject: Sanskrit character codes

Here is a message forwarded from Dr. Peter Schreiner, SOAS:

Though not connected to any network and thus not officially a
member of the Humanist group, I have through the kind services of
friends and colleagues been able to listen in on the discussion
about Sanskrit coding schemes. (I thank in particular Dr. W. Ott,
Tubingen, and Dr. D. Wujastyk, London!) I have been encouraged to
note down some of my experiences and opinions.
It seems, the discussion concerns two different steps in the
"processing" of transliteration:
a) defining the internal codes for letters with diacritical marks
(e.g. s with subscript dot = char(234), or whatever). Clearly, to
have generally accepted standards would contribute greatly to the
compatibility of software; and from the point of view of a user
like myself who does not write his own programs this is of utmost
importance. I am quite ready to help working towards an agreement
about standards on the occasion of the Vienna World Sanskrit
Conference (which might help to activate the ALLC specialist
b) defining what one does on the keyboard in transliterating
As has been rightly said in the discussion, Sanskritists have
agreed long ago on a standard transliteration (retroflex s is an
s with subscript dot, "long a" is an a with macron, etc). The
primary concern in defining "our" transliteration scheme was
typing speed and typing errors. Since the transliteration scheme
existed and one was really familiar enough not to have to think
about it while typing, the obvious input convention was to type
all diacritics in front of the letters. The period being used for
subscript dot, the semicolon was an obvious choice for the
superscript dot; and we chose the question mark for the tilda.
(These conventions are fairly arbitrary, and when changing to
U.K. keyboards I chose to replace "'s" by "/s").
The point is that the input code is clearly independent from
what happens to the input later on. I change "-a" to "%-a" for
printing the macron with TUSTEP (which is what I have been using
almost exclusively), to "\=a" for printing it in TeX, to "aa" for
printing it with Velthuis' Devanagari-TeX, to "circumflex
[overwrite] a" for a word-processor which can do no better, to
"02" for sorting purposes (since "long a" is the second character
in the alphabet). At most of these transformations I do not ever
have to look; and rather than WYTIWYG (T for type) I prefer to be
able to control what I am doing (and thus also getting, hopefully).
Ideally, points a) and b) will be compatible. My typed "-a" may
register in the machine as "char(195)" (acc. to Emmerick) or
"char(224)" (in Dominik's scheme); and ideally I shall be able to
see the sub- and superscript diacritics on screen (if I choose
so), but shall not have to type anything more complicated than an
"o" if (e.g.) my "long i" ("-i") turns out to be a typing (or
reading) mistake for "o" (which means that I want to be able to
see the input coding).
Lastly, talking about transliteration, it has been my ambition
to collect information about who has been transliterating what
and where. May I use this occasion to ask those who have created
their own library of transliterated texts to drop me a line!?
Peter Schreiner
S.O.A.S., Thornhaugh Street, Russell Square, London WC1H 0XG,

Replies via:

Dominik Wujastyk, | Janet: wujastyk@uk.ac.ucl.euclid
Wellcome Institute for | Bitnet/Earn/Ean/Uucp: wujastyk@euclid.ucl.ac.uk
the History of Medicine, | Internet/Arpa/Csnet: dow@wjh12.harvard.edu
183 Euston Road, | or: wujastyk%euclid@nsfnet-relay.ac.uk
London NW1 2BP, England. | Phone: London 387-4477 ext.3013
[Note that as of May 1989 the Janet-Internet gateway address has changed from
"nss.cs.ucl.ac.uk" to "nsfnet-relay.ac.uk"]