Humanist Discussion Group, Vol. 34, No. 304. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: Andrew Hawke <ach@geiriadur.ac.uk> Subject: Re: [Humanist] 34.301: finding by concordancing? then what? (105) [2] From: maurizio lana <maurizio.lana@uniupo.it> Subject: Re: [Humanist] 34.301: finding by concordancing? then what? (50) --[1]------------------------------------------------------------------------ Date: 2021-03-25 10:41:06+00:00 From: Andrew Hawke <ach@geiriadur.ac.uk> Subject: Re: [Humanist] 34.301: finding by concordancing? then what? We have a similar problem with a large collection of some 10,000 files of all types, including many PDFs, but also Word, text and Unicode text files, and even databases and spreadsheets which we need to search regularly for our work in revising our historical dictionary of Welsh, Geiriadur Prifysgol Cymru (http://gpc.cymru <http://gpc.cymru>). We use a long-established program called FileLocator Pro (https://www.mythicsoft.com/filelocatorpro/) which was originally designed for programmers, but works really well for our purposes and is extremely configurable. It extracts the text from files of many different formats (even if zipped) and searches that text. It can't produce KWIC output, but it can display hits with word-highlighting and extra context if needed, as well as showing the entire text on demand. It can do Boolean and regular expression searches, and is both powerful and quick. Results can be exported in various formats. A recently-added feature is the ability to cache text to an internal database making searches extremely quick. Our 75GB of texts can be searched for a regular expression in less than 30 seconds, and there is a simpler index-based search which is effectively instantaneous, but a little less flexible. Depending on your Mac hardware, you may be able to use it - it is available in 32-bit and 64-bit versions. I strongly suggest you try it out if you can run it. Finally I should say that the developer - Dave Vest who is based in Cambridge (UK) - offers great support and feedback, and has incorporated several features and improvements that we have suggested over the years. Best wishes, Andrew Hawke Ar 25/03/2021 08:00, ysgrifennodd Humanist: > Humanist Discussion Group, Vol. 34, No. 301. > Department of Digital Humanities, University of Cologne > Hosted by DH-Cologne > www.dhhumanist.org > Submit to: humanist@dhhumanist.org > > > > > Date: 2021-03-24 06:41:57+00:00 > From: Willard McCarty <willard.mccarty@mccarty.org.uk> > Subject: finding by concordancing? > > I collect a great many articles and books in the form of pdfs and try my > best to organise them. But of course different research questions call > for different organisations. Mostly, then, I use native (Spotlight) > indexing on my Mac and Finder to locate items potentially of interest. > The ensuing searches are, however, tedious to conduct. I find myself > wanting a pdf concordancer to give me whatever keyword(s) in context > together with corresponding filenames and context. The best I've found > so far is Adobe Reader's 'advanced search' facility. Is there anything > better? > > This leads me to a bigger question. Who has studied the effects on > research of what Roy Rosenzweig called "the problem of abundance"? > He asked of fellow historians, "what would it be like to write history > when faced by an essentially complete historical record?" (2003) This, > of course, is not a problem for historians alone but for most if not all > of us who use online resources. When I was a doctoral student, > working on Milton's Paradise Lost in its relation to biblical and > classical traditions, it was assumed that I would read EVERYTHING. > No one enforced it, because even then it was impossible to satisfy. > I assume now that we're done with that assumption. But surely > doing research has changed in some fundamental ways? > > Many thanks for suggestions and speculations. > > Yours, > WM > -- > Willard McCarty, > Professor emeritus, King's College London; > Editor, Interdisciplinary Science Reviews; Humanist > www.mccarty.org.uk ------------------------------------------------------------------------ Andrew Hawke - Golygydd Rheolaethol | // Managing Editor Geiriadur Prifysgol Cymru | University of Wales Dictionary of the Welsh Language Canolfan Uwchefrydiau Cymreig a Cheltaidd | Centre for Advanced Welsh & Celtic Studies Llyfrgell Genedlaethol Cymru | National Library of Wales Aberystwyth Ceredigion SY23 3HH Ffôn | Tel:+44 (0)1970 631012 Ebost | Email:ach@geiriadur.ac.uk Gwefan | Website:www.geiriadur.ac.uk GPC Ar Lein | GPC Online:http://gpc.cymru Apiau GPC | GPC Apps:www.geiriadur.ac.uk/apiau-android-ac-ios/ Facebook:www.facebook.com/geiriadurGPC Twitter:@geiriadur Prifysgol Cymru Y Drindod Dewi Sant | University of Wales Trinity Saint David Rhif elusen cofrestredig | Registered charity number:1149535 Rhif cwmni cofrestredig | Registered company number:RC000537 --[2]------------------------------------------------------------------------ Date: 2021-03-25 10:04:23+00:00 From: maurizio lana <maurizio.lana@uniupo.it> Subject: Re: [Humanist] 34.301: finding by concordancing? then what? Willard, given the way you open your message "I collect a great many articles and books in the form of pdfs and try my best to organise them. But of course different research questions call for different organisations" (and also because of the citation of Roy Rosenzweig!), i cannot but praise Zotero (www.zotero.org) which is free software, originally developed at Roy Rosenzweig Center for History and New Media (see https://rrchnm.org/zotero/). Zotero in itself is not a concordancer - but it seems to me that what you are searching for is a way to find a specific book by searching the content of a collection of books - and Zotero does it. with Zotero you build your personal digital library and get a wealth of tools meant to manage the collections. globally, when you use it you don't face the usual feeling "ok, to do this, i must understand these steps in the app" rather you have an app which fits your scholarly work like a glove. it imports bibliographic a data form DOI, ISBN, arXiv ID, Pubmed ID; you attach the file of the source (usually a PDF) to the bibliographic description, and the Zotfile plugin extracts as text files the passages you highlighted in the source PDF, sorting the passages by highlight color. it allows to define collections (=folders inside Zotero), to put a same source into as many suitable collections as you like (sort of smart links: the file of the source remains single, into its filesystem folder), to describe any record with as much different tags as you like; and to "relate" one source to another (the review and the reviewed book; two articles with similar content; two disparate sources where you discover a common view of a subject, ...). and all of these are criteria for crafting quick or advanced searches inside your library. and also - as Zotero indexes the content of the PDF of the sources - in the advanced search you can filter the sources not by the usual bibliodata author, date, title words, etc. but by words contained in the fulltetxt. best Maurizio μνάσασθαί τινά φαιμι †καὶ ἕτερον† ἀμμέων sono certa che qualcuno si ricorderà di noi anche quando ce ne saremo andati saffo, lobel-page 147 Maurizio Lana Università del Piemonte Orientale Dipartimento di Studi Umanistici Piazza Roma 36 - 13100 Vercelli _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php