Humanist Discussion Group

Humanist Archives: July 11, 2021, 8:29 a.m. Humanist 35.133 - machine-readable texts (for Egyptologists)

              Humanist Discussion Group, Vol. 35, No. 133.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                Submit to:

        Date: 2021-07-09 12:55:52+00:00
        From: Gabriel Bodard <gabriel.bodard@SAS.AC.UK>
        Subject: Machine-Readable Texts for Egyptologists (seminar)

[Apologies for the late announcement, but good to know about... WM]

This afternoon's Digital Classicist London seminar is streamed live at <>

Heidi Jauhiainen (University of Helsinki), Machine-Readable Texts for
Friday July 9, 2021,17:00 (UK time/UTC+1)

In order to use digital methods to study texts, one needs them in
machine-readable form. Assyriology has freely downloadable corpora of
machine-readable texts, such as Open Richly Annotated Cuneiform Corpus,
  but the lack of similar corpora hinders the digital study of ancient
Egyptian texts. A transliterated text in digital format, for example as
a text or TEI file, is machine-readable. Producing transliterated texts
manually is time consuming and, hence, there has been experiments in
automatically producing transliterated texts. However, in order to
produce machine-readable texts with automated transliteration, one needs
machine-readable hieroglyphic texts. There is a tradition in Egyptology
of using encoding to represent hieroglyphic texts so that the
information on the signs themselves and their places in regard to each
other is being maintained. Various types of encoding have been used when
publishing texts in books but those machine-readable texts are not
openly available. Such encoded texts could be produced by OCRing
hieroglyphic texts, but this approach requires a lot of texts in the
same handwriting for training the method.

In this paper, I present Machine-Readable Texts for Egyptologists,
which is a three-year project that started in the beginning of 2021.
The aim is to produce a large number of manually encoded hieroglyphic
texts and then to develop an iterative process and methods for
automatically transliterating the encoded texts. During the process, the
automatically transliterated texts will be validated and, if necessary,
corrected and then used for making the method more accurate. Both the
coded texts and their transliterations will eventually be offered for
free download.


Dr Gabriel BODARD (he/him)
Reader in Digital Classics

Institute of Classical Studies / Digital Humanities Research Hub
University of London
Senate House
Malet Street
London WC1E 7HU

T: +44 (0)20 78628752

Unsubscribe at:
List posts to:
List info and archives at at:
Listmember interface at:
Subscribe at: