Humanist Discussion Group, Vol. 34, No. 84. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: email@example.com Date: 2020-06-04 18:52:10+00:00 From: Ryan Dubnicek
Subject: Announcing HTRC Extracted Features v.2.0! HathiTrust Research Center (HTRC) is excited to announce the release of the Extracted Features 2.0 dataset! This new version of Extracted Features offers volume- and page-level data for 17+ million volumes in the HathiTrust Digital Library. The data include: * Bibliographic metadata * Computationally-inferred metadata about the page, such as language and line counts * Tokens (words), parts of speech, and their per-page counts Overall, the dataset represents more than 6 billion pages of text from the digital library and includes nearly 3 trillion tokens from the corpus. Not only does this release extend the number of volumes in HathiTrust available as Extracted Features, it also incorporates linked data such that names in the files are linked to external authorities when possible. Learn more about the release and data schema: https://wiki.htrc.illinois.edu/x/kYC2B Download Extracted Features 2.0 files: https://wiki.htrc.illinois.edu/x/_QGGAQ Contact firstname.lastname@example.org with any questions. _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: email@example.com List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.