Humanist Discussion Group, Vol. 37, No. 83. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: Wust, Markus <markus.wust@uni-tuebingen.de> Subject: AW: [Humanist] 37.79: literary corpora without publishers' data? (61) [2] From: maurizio lana <maurizio.lana@uniupo.it> Subject: Re: [Humanist] 37.79: literary corpora without publishers' data? (24) --[1]------------------------------------------------------------------------ Date: 2023-06-07 09:18:08+00:00 From: Wust, Markus <markus.wust@uni-tuebingen.de> Subject: AW: [Humanist] 37.79: literary corpora without publishers' data? Dear Claire, You could check with your library to see if some of their licensed collections came with data mining agreements. If so, there might be ways of accessing the materials without having to go through the provider's online search interface. Best, Markus Wust University of Tübingen -----Ursprüngliche Nachricht----- Von: Humanist <humanist@dhhumanist.org> Gesendet: Mittwoch, 7. Juni 2023 06:44 An: Wust, Markus <markus.wust@uni-tuebingen.de> Betreff: [Humanist] 37.79: literary corpora without publishers' data? Humanist Discussion Group, Vol. 37, No. 79. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-06-06 10:54:44+00:00 From: WARWICK, CLAIRE L. <c.l.h.warwick@durham.ac.uk> Subject: literary data dump Dear everyone, Please forgive me if this is an extremely dumb question, but I’m wondering about how to get hold of a lot of literary text without the kind of front end that publishers tend to attach to it. The context is this: I have students who want to work on data science approaches to both English and Latin poetry, eg they need English translations of the whole corpus of Latin poetry, and a corpus of all English poetry (or even all literature in English) from 1500-1700, so that they can train and run language models. Of course, my university has access to things like Proquest’s Literature Online, but that comes with its own search interface, and would mean having to download each poem one by one, which would be incredibly time consuming and probably not allowed anyway. Obviously, there is also the OTA, but I don’t know how comprehensive its holdings are. I also need accurate texts so don’t want to use things like Project Gutenberg, whose quality I am not certain of. I’d be most grateful for any suggestions. Best wishes, Claire -------- Claire Warwick MA, MPhil, PhD Professor of Digital Humanities Co-Director Durham Institute of Data Science Department of English Studies Durham University www.durham.ac.uk/staff/c-l-h-warwick/ --[2]------------------------------------------------------------------------ Date: 2023-06-07 08:55:44+00:00 From: maurizio lana <maurizio.lana@uniupo.it> Subject: Re: [Humanist] 37.79: literary corpora without publishers' data? hi Claire, the latin texts of PHI cdrom (!) are a good starting point for archaic and golden period - that is all the 'usually' studied latin authors. the texts are Beta-encoded but you should easily find tools to de-encode them. we have also in the digital library digilibLT a lot of late-latin texts, which are available encoded in TEI-XML but also as simple txt files. you can get all of them as a single zip file going to https://digiliblt.uniupo.it/g_bulk_opere.php best Maurizio apriti cielo sulla frontiera sulla rotta nera una vita intera Mannarino, apriti cielo ------------------------------------------------------------------------ Maurizio Lana Università del Piemonte Orientale Dipartimento di Studi Umanistici Piazza Roma 36 - 13100 Vercelli _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php