In reply to Steve's question about Busa, let me
quote some passages from my forthcoming doctoral dissertation:

In his doctoral dissertation which was published
in 1949 (Busa, 1949) Roberto Busa concentrated on
the concept of presence in the works of Thomas
Aquinas. Therefore, he wrote out by hand 10,000
3" x 5" cards each containing a sentence with the
word in or a word connected with in. (Busa, 1980,
p. 83) In doing so, he started to think about
methods to automate linguistic analysis of texts.
In Busa’s own writings, the momentum of this idea
is reflected on as a period ranging from 1941 or
1942 to 1946.[1] The latter date marks the
transition from the defence of his Ph.D. thesis
to the plan for the Index Thomisticus, a
lemmatized concordance of all the words in the
complete works of Thomas Aquinas, ‘including
conjunctions, prepositions and pronouns, to serve
other scholars for analogous studies.’ (Busa,
1980, p. 83) It wasn’t till 1950, however, that
he published his plans in an announcement in Speculum. (Busa, 1950)

The question is not so much 'did he ever return
to his investigation of the notion of 'presence'
but rather, how did he move on from this
research? During this research, Busa gained two
major insights which together form the
theoretical model on which most of his scholarly
work up to now has been based.[2] Firstly, he
realized that a reader of a text cannot approach
that text with his own conceptual verbal system
but has to study the author’s. Therefore ‘a
philological and lexicographical inquiry into the
verbal system of an author has to precede and
prepare for a doctrinal interpretation of his
works.’ His second insight was that the basic
structures of human discourse are not generated
by the so called “meaningful” words, but by all
functional or grammatical words ‘which in my mind
are not ‘empty’ at all but philosophically rich.’
In these words, Busa sees the manifestation of
‘the deepest logic of being’ and it is ‘this
basic logic that allows the transfer from what
the words mean today to what they meant to the
writer.’ (Busa, 1980, p. 83)[3] From this
‘generative ontology’ (Busa, 2004a, p. 16) he
developed the concept and method of ermeneutica
computerizzata or computational hermeneutics
(Busa, 1998), computerized hermeneutics (Busa,
1998 and 2002), hermeneutical informatics (Busa,
1999), or hermeneutic informatics (Busa,
2004b).[4] Computerized hermeneutics he sees as
the only form of text processing which allows one
to discover the unknown.[5] In order to allow
other scholars in both the humanities and
artificial intelligence to perform research on
the texts he was working on, he had to prepare ‘a
map of a linguistic universe’, that is ‘a
document in which as many facts as possible are
given with as few as possible personal
interpretations’ (Busa, 1976, p. 114), thus
preferring an inductional approach over a
deductional one. This could be achieved by coding
the several categories of vocabulary redundancy
in the text during the input phase of a
project:[6] flexions and conjugations should be
lemmatized and the literal quotations in texts,
or quotations ad sensum, for instance, should be
coded as such in order not to corrupt any
research which tries to draw inferences from the
author’s verbal system, for they are not his words.

Eventually, 10.631.973 words (tokens) were processed. [7]


[1] Cp. ‘During World War II, between 1941 and
1946, I began to look for machines for the
automation of the linguistic analysis of written
texts.’ (Busa, 2004b, p. xvi) and ‘The idea of
linguistic analysis first came to me in 1942
(...)’ (Busa, 2002, p. 49). The former quote does
not give an exact date for the start of his
quest, instead the start and end date of his
doctoral research is given, which covers the
period of WW II. Although the latter quote does
mention an exact year, it does not say that the
idea reported on here includes the use of
computing machines. Busa reaffirmed the earliest
date to me in a private email of 24 July 2005.

[2] An ‘essential’ bibliography of Busa’s
writings up to 2002 is published in Busa (2002).

[3] ‘Grammar is the foundation of philosophy.
Philosophy aims at unifying synthesis of the
whole cosmos. Examining those grammatical words
is the only possible path leading to and
documenting such a synthesis, when near to its goal.’ (Busa, 2004a, p. 17)

[4]‘I insist on calling it hermeneutics, that is
interpretation. It is, in fact, one of our
cognitive activities which, by going backwards,
seeks to reconstruct from a text written by
others the structures, rules and choices of the
thinking which is there so expressed.
It does this by observing one at a time all the
aspects that constitute the text of another
author, first of all and above of all the reality
of things, rather than and before making a list
of the opinions and judgements of others who have spoken about it.
It does this knowing in this way a scholar
himself firstly puts into execution that same
logic which tries to recover, define and describe
in the work of another, that is the same logic
which is common to all, and secondly seeks to
define and describe also what message and different styles emerge from it.
(...) When I say that such hermeneutics is
computerized, I mean computer assisted: the
scholar makes the computer perform firstly all
the operations of assembling, ordering,
re-ordering, summarizing etc., and secondly all
the searches for single data or groups of data
which every heuristic strategy requires, one after the other.
In fact, the specific function of the electronic
organizer is that of carrying out censuses which
are exhaustive, quantized and classified of the
linguistic elementary micro-elements that form the framework of any text.
Such a service is all the more valuable in that
it really seems that every linguistic category is
fuzzy or approximate and not rigid.
Perhaps no linguistic category is absolute; perhaps all admit of exceptions.
Only with the computer can the probability curves
of such exceptions be specified in numbers and
percentages, in order, furthermore, to identify
what these are, and, finally, to check whether
they are merely a noise that can be ignored or
whether they carry a message, that is, are significant.’ (Busa, 2002, p. 56-57)

[5] In his model, Busa sees two other lines of
text processing, namely information retrieval
service infrastructures such as databanks,
hypertext techniques, the Internet, and the WWW,
and several ways of publishing new kinds of books
such as diskettes, CDs, multimedia, etc. These,
however, are aimed at retrieving and diffusing
what is already known. (Busa, 1998) In Busa
(1999), he calls the respectively ‘data bank’ and
‘publishing informatics’ which he considers
social services (p. 5): ‘the number of their
consumers is extremely large and expanding.
Consequently, the money invested in them has quick returns.’ (p. 6)

[6] In Busa (1976, p. 115), he mentions seven
different categories which can result in
vocabulary redundancy: authorship, content,
grouping of the inflexions into lemma units,
polysemy, polymorphy, concentrations of frequency, and correlations of words.

[7] Figure according to the project report Opera
quae in indicem thomisticum sunt redacta
concluded on February 2, 1975 and revised in 1980
(privately made available to me). The 118 works
of Thomas Aquinas contain 8.767.848 tokens, the
61 works by other authors connected with the
Thomistic works contain 1.864.125 tokens. Over
the course of time, these figures were adjusted
constantly which explains why Busa (1976) reports
on ‘ten and a half million words’ (p. 114); Busa
(1980) quotes the figures ‘10.600.000’ (p. 85)
and ‘10.666.000’ (p. 86); Busa (2002) speaks of
‘11.000.000 words’ (p. 58, 59, 60, 61, and 62);
and Busa (2004a, p. 15-16, and 2004b, p. xvii) mention ‘11 million words’.


Busa, Roberto (1949). La terminologia Tomistica
dell'Interioritą: Saggi di metodo per una
interpretazione della metafisica della presenza. Milano: Bocca.
Busa, Roberto (1950). Complete Index Verborum of
Works of St Thomas. Speculum: a journal of
medieval studies, XXV/1 (january 1950): 424-425.
Busa, Roberto (1976). Computer Processing of over
Ten Million Words: Retrospective Criticism. In
Alan Jones and R.F. Churchhouse (eds.). The
Computer in Literary and Linguistic Studies.
(Proceedings of the Third International
Symposium). Cardiff: The University of Wales Press, p. 114-117.
Busa, Roberto (1980). The Annals of Humanities
Computing: The Index Thomisticus. Computers and the Humanities, 14: p. 83-90.
Busa, Roberto (1998). Concluding a Life’s Safari
from Punched Cards to World Wide Web. In L.
Burnard, M. Deegan and H. Short (eds.). The
Digitial Demotic: Selected Papers from DRH97,
Digital Resources for the Humanities Conference,
St. Anne's College, Oxford, September 1997.
London: Office for Humanities Communication, p. 3-11.
Busa, Roberto (1999). Picture a Man... Literary
and Linguistic Computing, 14: 5-9.
Busa, Roberto (2002). Hermeneutika e
kompiuterizar. Pas gjashtėdhjetė
vjetėsh—L'ermeneutica computerizzata.
Sessant’anni dopo—Computerized hermeneutics. Sixty years on. Tiranė: albin.
Busa, Roberto (2004a). Analysis of scientific and
philosophical texts. What differentiates them and
what they have in common. In buzzetti, Dino,
Pancaldi, Guiliano, and Short, Harold. (eds.)
Augmenting Comprehension. Digital Tools and the
History of Ideas. Proceedings of a conference at
Bologna, 22-23 September 2002. London: Office for
Humanities Publication, p. 15-17.
Busa, Roberto (2004b). Foreword: Perspectives on
the Digital Humanities. In Schreibman, Susan,
Siemens, Ray, and Unsworth, John. A Companion to
Digital Humanities. Malden, MA/Oxford/Carlton,
Victoria: Blackwell Publishing. p. xvi-xxi.


Last time I met him, he gave me a full list of his publications (up
to 1991). I can't find anything on "presence." One of the volumes of
Index Thomisticus (1979/80), Sectio Secunda, Vol. 7 is intitled "Et =
In = Quam quod" XVI + 1270. The XVI pages of introduction might have
some discussion about the "In" concordances.
