Humanist Discussion Group

Humanist Archives: March 27, 2021, 10 a.m. Humanist 34.304 - finding by concordancing

				                  Humanist Discussion Group, Vol. 34, No. 304.
        Department of Digital Humanities, University of Cologne
                   		Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Andrew Hawke 
           Subject: Re: [Humanist] 34.301: finding by concordancing? then what? (105)

    [2]    From: maurizio lana 
           Subject: Re: [Humanist] 34.301: finding by concordancing? then what? (50)


--[1]------------------------------------------------------------------------
        Date: 2021-03-25 10:41:06+00:00
        From: Andrew Hawke 
        Subject: Re: [Humanist] 34.301: finding by concordancing? then what?

We have a similar problem with a large collection of some 10,000 files
of all types, including many PDFs, but also Word, text and Unicode text
files, and even databases and spreadsheets which we need to search
regularly for our work in revising our historical dictionary of Welsh,
Geiriadur Prifysgol Cymru (http://gpc.cymru ).

We use a long-established program called FileLocator Pro
(https://www.mythicsoft.com/filelocatorpro/) which was originally
designed for programmers, but works really well for our purposes and is
extremely configurable. It extracts the text from files of many
different formats (even if zipped) and searches that text. It can't
produce KWIC output, but it can display hits with word-highlighting and
extra context if needed, as well as showing the entire text on demand.
It can do Boolean and regular expression searches, and is both powerful
and quick. Results can be exported in various formats.

A recently-added feature is the ability to cache text to an internal
database making searches extremely quick. Our 75GB of texts can be
searched for a regular expression in less than 30 seconds, and there is
a simpler index-based search which is effectively instantaneous, but a
little less flexible.

Depending on your Mac hardware, you may be able to use it - it is
available in 32-bit and 64-bit versions. I strongly suggest you try it
out if you can run it. Finally I should say that the developer - Dave
Vest who is based in Cambridge (UK) - offers great support and feedback,
and has incorporated several features and improvements that we have
suggested over the years.

Best wishes,

Andrew Hawke

Ar 25/03/2021 08:00, ysgrifennodd Humanist:
>                    Humanist Discussion Group, Vol. 34, No. 301.
>          Department of Digital Humanities, University of Cologne
>                               Hosted by DH-Cologne
>                         www.dhhumanist.org
>                  Submit to: humanist@dhhumanist.org
>
>
>
>
>          Date: 2021-03-24 06:41:57+00:00
>          From: Willard McCarty 
>          Subject: finding by concordancing?
>
> I collect a great many articles and books in the form of pdfs and try my
> best to organise them. But of course different research questions call
> for different organisations. Mostly, then, I use native (Spotlight)
> indexing on my Mac and Finder to locate items potentially of interest.
> The ensuing searches are, however, tedious to conduct. I find myself
> wanting a pdf concordancer to give me whatever keyword(s)  in context
> together with corresponding filenames and context. The best I've found
> so far is Adobe Reader's 'advanced search' facility. Is there anything
> better?
>
> This leads me to a bigger question. Who has studied the effects on
> research of what Roy Rosenzweig called "the problem of abundance"?
> He asked of fellow historians, "what would it be like to write history
> when faced by an essentially complete historical record?" (2003) This,
> of course, is not a problem for historians alone but for most if not all
> of us who use online resources. When I was a doctoral student,
> working on Milton's Paradise Lost in its relation to biblical and
> classical traditions, it was assumed that I would read EVERYTHING.
> No one enforced it, because even then it was impossible to satisfy.
> I assume now that we're done with that assumption. But surely
> doing research has changed in some fundamental ways?
>
> Many thanks for suggestions and speculations.
>
> Yours,
> WM
> --
> Willard McCarty,
> Professor emeritus, King's College London;
> Editor, Interdisciplinary Science Reviews;  Humanist
> www.mccarty.org.uk

------------------------------------------------------------------------

Andrew Hawke - Golygydd Rheolaethol | // Managing Editor

Geiriadur Prifysgol Cymru | University of Wales Dictionary of the Welsh
Language
Canolfan Uwchefrydiau Cymreig a Cheltaidd | Centre for Advanced Welsh &
Celtic Studies
Llyfrgell Genedlaethol Cymru | National Library of Wales
Aberystwyth
Ceredigion
SY23 3HH

Ffôn | Tel:+44 (0)1970 631012
Ebost | Email:ach@geiriadur.ac.uk
Gwefan | Website:www.geiriadur.ac.uk
GPC Ar Lein | GPC Online:http://gpc.cymru
Apiau GPC | GPC Apps:www.geiriadur.ac.uk/apiau-android-ac-ios/
Facebook:www.facebook.com/geiriadurGPC
Twitter:@geiriadur

Prifysgol Cymru Y Drindod Dewi Sant | University of Wales Trinity Saint
David
Rhif elusen cofrestredig | Registered charity number:1149535
Rhif cwmni cofrestredig | Registered company number:RC000537


--[2]------------------------------------------------------------------------
        Date: 2021-03-25 10:04:23+00:00
        From: maurizio lana 
        Subject: Re: [Humanist] 34.301: finding by concordancing? then what?

Willard,

given the way you open your message "I collect a great many articles
and books in the form of pdfs and try my best to organise them. But
of course different research questions call for different
organisations" (and also because of the citation of Roy
Rosenzweig!), i cannot but praise Zotero (www.zotero.org) which is
free software, originally developed at Roy Rosenzweig Center for
History and New Media (see https://rrchnm.org/zotero/).
Zotero in itself is not a concordancer - but it seems to me that
what you are searching for is a way to find a specific book by
searching the content of a collection of books - and Zotero does it.
with Zotero you build your personal digital library and get a wealth
of tools meant to manage the collections.

globally, when you use it you don't face the usual feeling "ok, to
do this, i must understand these steps in the app" rather you have
an app which fits your scholarly work like a glove.

it imports bibliographic a data form DOI, ISBN, arXiv ID, Pubmed ID;
you attach the file of the source (usually a PDF) to the
bibliographic description, and the Zotfile plugin extracts as text
files the passages you highlighted in the source PDF, sorting the
passages by highlight color.

it allows to define collections (=folders inside Zotero), to put a
same source into as many suitable collections as you like (sort of
smart links: the file of the source remains single, into its
filesystem folder), to describe any record with as much different
tags as you like; and to "relate" one source to another (the review 
and the reviewed book; two articles with similar content; two 
disparate sources where you discover a common view of a subject, ...). 
and all of these are criteria for crafting quick or advanced searches 
inside your library. and also - as Zotero indexes the content of the 
PDF of the sources - in the advanced search you can filter the 
sources not by the usual bibliodata author, date, title words, etc.
 but by words contained in the fulltetxt.

best
Maurizio


μνάσασθαί τινά φαιμι †καὶ ἕτερον† ἀμμέων
sono certa che qualcuno si ricorderà di noi anche quando ce ne saremo andati
saffo, lobel-page 147

Maurizio Lana
Università del Piemonte Orientale
Dipartimento di Studi Umanistici
Piazza Roma 36 - 13100 Vercelli


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php