Humanist Archives: March 26, 2020, 9:37 a.m. Humanist 33.695 - new tools for (Greek) corpus linguistics

        Date: 2020-03-25 11:39:31+00:00
        From: Alek Keersmaekers 
        Subject: New computational tools for Greek corpus linguistics

Dear members of this list,

I'm excited to announce some new computational tools for Ancient Greek
corpus linguistics:

- First of all, the Duke papyrus texts
(https://github.com/alekkeersmaekers/duke-nlp) are now not only
automatically annotated for lemmas and morphology but for syntax and
semantic roles as well, making this the largest diachronic treebank for
Ancient Greek so far (about 4.5 million tokens). The accuracy for syntax
and semantics (about 85-90% and 81% respectively for letters) is lower
than for morphology and lemmatization, but still decent enough to be
used in linguistic research.

- DendroSearch (https://github.com/alekkeersmaekers/dendrosearch), a
user-friendly query tool for Greek treebanks, including all treebank
material that is available to date (if your treebank is still missing,
please let me know!)

- An automatic semantic role labeler
(https://github.com/alekkeersmaekers/PRL), using the roles of the
Pedalion grammar created at the University of Leuven
(http://en.pedalion.org/). It also includes an animacy lexicon, partly
based on the animacy lexicon of the PROIEL project (many thanks to Dag
Haug!) and distributional word vectors for Greek lemmas.

None of this would be possible without the painstaking work of the
ancient Greek treebanking community, so many thanks to the people of the
PROIEL, AGDT and Sematia projects, Vanessa Gorman, J.M. Harrington and
his team, Polina Yordanova, and the job students involved in the
Pedalion treebanks!

All the best,

