Humanist Discussion Group

Humanist Archives: June 7, 2023, 5:44 a.m. Humanist 37.79 - literary corpora without publishers' data?

				
              Humanist Discussion Group, Vol. 37, No. 79.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2023-06-06 10:54:44+00:00
        From: WARWICK, CLAIRE L. <c.l.h.warwick@durham.ac.uk>
        Subject: literary data dump

Dear everyone,

Please forgive me if this is an extremely dumb question, but I’m wondering about
how to get hold of a lot of literary text without the kind of front end that
publishers tend to attach to it. The context is this: I have students who want
to work on data science approaches to both English and Latin poetry, eg they
need English translations of the whole corpus of Latin poetry, and a corpus of
all English poetry (or even all literature in English) from 1500-1700, so that
they can train and run language models.

Of course, my university has access to things like Proquest’s Literature Online,
but that comes with its own search interface, and would mean having to download
each poem one by one, which would be incredibly time consuming and probably not
allowed anyway. Obviously, there is also the OTA, but I don’t know how
comprehensive its holdings are. I also need accurate texts so don’t want to use
things like Project Gutenberg, whose quality I am not certain of.

I’d be most grateful for any suggestions.

Best wishes,

Claire



--------
Claire Warwick MA, MPhil, PhD
Professor of Digital Humanities
Co-Director Durham Institute of Data Science
Department of English Studies
Durham University
www.durham.ac.uk/staff/c-l-h-warwick/


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php