Humanist Discussion Group

Humanist Archives: May 17, 2022, 6:18 a.m. Humanist 36.17 - working with unnormalised historical texts

				
              Humanist Discussion Group, Vol. 36, No. 17.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2022-05-16 13:46:31+00:00
        From: Gabor Toth <gabor.toth@maximilianeum.de>
        Subject: Re: [Humanist] 36.12: working with unnormalised historical texts

Dear Crystal,

Many thanks for your detailed answer; congratulations for working out all
this, which sounds great.

Could you please explain what you mean by the following two points:

1. "I hand coded the entries for part of speech using a
simplified Penn Tree Bank system, marked known irregulars for hand
processing, and built out the other possible variations algorithmically."

I understand the hand coding part but I am not sure about the following
steps.

2. "The result was a first draft of over 3 million forms"

That sounds like a very big number, by forms do you mean types?

Cheers,

Gabor



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php