Humanist Discussion Group, Vol. 37, No. 79. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-06-06 10:54:44+00:00 From: WARWICK, CLAIRE L. <c.l.h.warwick@durham.ac.uk> Subject: literary data dump Dear everyone, Please forgive me if this is an extremely dumb question, but I’m wondering about how to get hold of a lot of literary text without the kind of front end that publishers tend to attach to it. The context is this: I have students who want to work on data science approaches to both English and Latin poetry, eg they need English translations of the whole corpus of Latin poetry, and a corpus of all English poetry (or even all literature in English) from 1500-1700, so that they can train and run language models. Of course, my university has access to things like Proquest’s Literature Online, but that comes with its own search interface, and would mean having to download each poem one by one, which would be incredibly time consuming and probably not allowed anyway. Obviously, there is also the OTA, but I don’t know how comprehensive its holdings are. I also need accurate texts so don’t want to use things like Project Gutenberg, whose quality I am not certain of. I’d be most grateful for any suggestions. Best wishes, Claire -------- Claire Warwick MA, MPhil, PhD Professor of Digital Humanities Co-Director Durham Institute of Data Science Department of English Studies Durham University www.durham.ac.uk/staff/c-l-h-warwick/ _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php