Humanist Discussion Group

Humanist Archives: Dec. 28, 2021, 8:30 a.m. Humanist 35.425 - research using very long documents

              Humanist Discussion Group, Vol. 35, No. 425.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                Submit to:

        Date: 2021-12-27 14:19:53+00:00
        From: Henry Schaffer <>
        Subject: Re: [Humanist] 35.421: research using very long documents?

This is interesting and got me thinking about what is a "long scholarly
document" and what facilities might be needed to work with "a large corpus
of long scholarly documents". As a geek, my first thought is - How much
storage? 1 MB or GB or TB or PB ...?

My campus provides HPC for all faculty and the students (primarily grad
students) working on research topics. Each account is automatically
provided a small home directory (1 GB) and working storage of 10TB. A
permanent storage directory of 2TB is provided on request. Additional
storage is provided on a charge basis. Next is the computational capability
- there are about 10,000 cores in the cluster, with both CPUs and GPUs in
the mix. While this is a very useful HPC setup for a campus, it isn't as
large as those provided at many campuses and clearly isn't in the category
of supercomputer centers.

Is this HPC cluster inadequate for research work with "long scholarly
documents"? If so, wouldn't it make sense to give some numbers as to what
might be needed? Or should attention be paid to researchers developing
familiarity with and capacity to use existing HPC resources?

I read the survey and wanted to respond positively - but since I couldn't
tell what would be included in the non-quantitative, and therefore vague,
term "infrastructure", I didn't submit it.

If I just didn't get it - I'd welcome an explanation.


On Mon, Dec 27, 2021 at 4:34 AM Humanist <> wrote:

>               Humanist Discussion Group, Vol. 35, No. 421.
>         Department of Digital Humanities, University of Cologne
>                       Hosted by DH-Cologne
>                 Submit to:
>         Date: 2021-12-23 19:26:56+00:00
>         From: Worthey, Glen Cameron <>
>         Subject: "long documents" research: a brief community survey
> Dear colleagues,
> Please take a moment to fill out this very short survey in support of
> research
> using very long documents:
> https://urldefens
> 3fjg
> <
> !u9gUBS2JR6Vv96OeRPPpTlD5ShiYupsf0-7_h_XScJWTekjdptEL4Gdot_0-SaMYQE4$>
> The survey is truly short: 4 multiple-choice questions.  But we believe
> the is
> significant: to help us better understand the research community needs —
> in the
> digital humanities, and the data, information, and computer sciences — for
> infrastructures to support research using long scholarly documents.
> Thanks for participating,
> Glen Worthey
> (on behalf of a multi-institutional effort at the HathiTrust Research
> Center,
> Virginia Tech, and the U. of Mary Washington)
> --
> Glen Layne-Worthey
> Associate Director for Research Support Services, HathiTrust Research
> Center
> School of Information Sciences, University of Illinois at Urbana-Champaign
> Executive Board Chair, Alliance of Digital Humanities Organizations (ADHO)
><> | 650-213-6759

Unsubscribe at:
List posts to:
List info and archives at at:
Listmember interface at:
Subscribe at: