Humanist Archives: Jan. 10, 2019, 6:01 a.m. Humanist 32.311 - toward a theory of the corpus

                  Humanist Discussion Group, Vol. 32, No. 311.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
    [1]    From: Francois Lachance 
           Subject: Re: [Humanist] 32.307: toward a theory of the corpus (166)

    [2]    From: Bill Benzon 
           Subject: Re: [Humanist] 32.307: toward a theory of the corpus (238)

        Date: 2019-01-10 01:07:38+00:00
        From: Francois Lachance 
        Subject: Re: [Humanist] 32.307: toward a theory of the corpus


I have been mulling over the call to provide a "concrete" example (see
Humanist 32.307) of close reading and matrices. In a sense I have been
asked to pay a price for my imaginative ramblings.

>> I am very much plugging 'close reading' into 'corpus as tool'.  I am
>> thinking in terms of the algebra of matrices. What is list of pairs can be
>> read as a table of elements.  In reading are there not a number of
>> traversals that each gain a certain valence from a time series?

> So, what, concretely, happens when we 'plug' the procedure of close reading
> into in an investigation where a corpus has been used to construct a
tool which
> is then put to some use (such as Google's translation engine)? Give me
> examples where this has been done.

To the implicit injunction to show the money I have but loose, perhaps
short, change.

Before I do provide my slim example, I ask you to indulge me on a little
detour about size matters. I turn to the lay of the land offered by Thomas

The corpora of literary history, so is the claim, are simply too gigantic
to be read through individually. This argument is repeated often. It
emerges in tandem with the methodological cliché of the _close reader_ who
picks out of a text that which tastes best to his or her respective
theoretical predilection. The battlefront drawn here between small versus
big data, _close reading_ versus _distant reading_ supplies a very
imprecise atlas of the contemporary research landscape. Procedures taking
place on the middle level, mid-size co[r]pora processed as _smart data_,
are hardly appreciated. And if anything, big data is entirely
overestimated. Above all, this battlefront is unproductive for the
development of future projects, because the step from consolidation to
innovation can only succeed if it synergistically combines our
hermeneutical reading with the digital analyses of the computer in
concrete analyses.

Page 8
Thomas Weitin
Thinking slowly.
Reading Literature in the Aftermath of Big Data
LitLingLab Pamphlet #1
March 2015

Weitlin points towards remarks by Martin Mueller about structured data and
scalable reading which contain this astute crystalization:

However closely we read individual texts, interpretation is always a form of
contextualizing or of putting a particular detail within a wider frame
that gives or
receives meaning from an act of 'focalization'. This is a fundamental and
recursive procedure that operates from the lowest level of a single
sentence through the parts of a work to the level of author, genre, and

Martin Mueller
Morgenstern's Spectacles or the Importance of Not-Reading
January 21, 2013

For me, Martin Mueller's remarks on focalization dovetail nicely with his
notion of a plurality of "surrogates" and that "Every surrogate has its
own query potential, which for some purposes may exceed that of the

I have an example of close reading of a matrix. And it is a reading not of
an entire poem but of two of its parts. As such these parts might be a
"corpus" of the size greater than zero (no document) but less than one (a
complete document). I, of course, am stretching the notion of a corpus but
my purpose is expose to our thinking the general notion that the
methodical reading of sets of textual objects has similarities across
scales. Mueller and Weitlin remind us that, regardless of corpora scale,
the nature and degree of markup present in structured data contributes
significantly to the useful analytical operations you can perform.

It is very sketchy. May have been suitable for a poster. And even then you
will see it is quite thin.

It is an HTML file that captures matrices transcribed from Robert Duncan's
"The Fire Passages 13" from _Bending the Bow_.


The poem begins and ends with matrices. They mirror each other. The words
switch positions except for those on a diagonal axis around which the
symmetry rotates. The words along this axis are coded with the value of
"red" on the _color_ attribute of the TD element.

As I said this is quite thin. It is from the 1990s and hand coding HTML
was my entry into more sophisticated text encoding. In the example the
HTML formatting is used to display data features (at a later stage in my
initiation in the joys of computer display and encoding, I would have
worked in XML (using TEI feature structures) and XSLT for

At this distance in time, I can report that the close reading of the
matrices was supported by the affordances offered by HTML and browser
display. The hand coding - rows and table data - and the addition of
attributes - meant that I was physically transcribing and rearranging the
text - a very productive form of reading.

In the background was my understanding from high school algebra that
matrices could be multiplied -- they are transformable objects. How a
matrix could be both a result and an operator mirrored a key lesson in my
doctoral thesis where I drew upon a passage in Hodges's bio of Turing that
machine configurations could be either instructions or states.

From the first point of view, it was natural to think of the configuration
as the machine's _internal state_ ­­ something to be inferred from its
different responses to different stimuli, rather as in behaviourist
psychology.  From the second point of view, however, it was natural to
think of the configuration as a _written instruction_, and the table as a
list of instructions, telling the machine what to do.  The machine could
be thought as obeying one instruction, and then moving to another
instruction.  The universal machine could then be pictured as reading and
decoding the instructions placed upon the tape.  Alan Turing himself did
not stick to his original abstract term "configuration", but later
described machines quite freely in terms of "states" and "instructions",
according to the interpretation he had in mind. [original emphasis]

Hodges, Andrew. _Alan Turing: The Enigma of Intelligence_. p. 107

It is perhaps ironic that the hyperlink ("jump") in my example now leads
to a 404 message for a file called "dance.htm".

I have taken the time to recount this tale of what I entertained and what
entertained me in the 1990s not to simply celebrate primitivism but to
reflect briefly on discipline. I for various biographical reasons
certainly did not have the discipline to pursue the construction of a
corpus of poems that contain matrices, the ways those matrices could be
encoded, and the methods that could be used to explored a set of such
textual objects. Neither was I anchored in a discipline where bold
transformation of an original text would count as scholarship.

But I have never lacked for community. And so when years and years later I
encountered a matrix in "Petroglyph" in _Discovery Passages_ by Garry
Thomas Morse, I was able to offer some comments:


And with my own matrix with a fleck of red bring in Duncan once more into
the great conversation:


So thanks goes out to Bill Benzon whose own observations influenced the
trajectory of these.

Francois Lachance

        Date: 2019-01-08 16:16:51+00:00
        From: Bill Benzon 
        Subject: Re: [Humanist] 32.307: toward a theory of the corpus

Responses below


> --[1]------------------------------------------------------------------------
>        Date: 2019-01-08 00:45:01+00:00
>        From: Francois Lachance 
>        Subject: More metaphors inspired by the reception of " toward a theory
of the corpus"
> Willard
> Following my initial response to Bill Benzon's posting, I was contacted
> off-list with a bit of tangent on hidden meanings. I have permission to
> bring elements of that to the list.


> What do these esoteric modes of reading or translating have to do with
> humanities computing? They begin to offer a glimpse of labels for a set of
> attitudes towards the act of processing text. The tools of humanities
> computing are for hunting and gathering: the going out. They are also for
> fishing: waiting to see what shows up. In a sense the tools of humanities
> computing can be set like traps or weirs.
> The distinction between hunting and fishing breaks down when you consider
> the use of duck blinds. The distinction between gathering and fishing
> becomes moot when you consider harvesting a salmon run. Still there is
> some merit, I believe, in pondering whether one's orientation is towards
> aiming for a target, assembling a resource, or waiting to see what the
> network might bring.

I'm certainly in favor of intellectual fishing expeditions, especially if one
doesn't pre-judge what constitutes a worthwhile catch. That rubber boot might
not constitute a tasty meal or a nice trophy, but it might lead to a whole new

> I do like the idea of casting a net... gutting the fish not so much --
> though those guts do make good fertilizer for the garden.
> --
> Francois Lachance
> Scholar-at-large
> http://www.chass.utoronto.ca/~lachance
> https://berneval.blogspot.com 


> --[3]------------------------------------------------------------------------
>        Date: 2019-01-07 14:10:01+00:00
>        From: Jim Rovira 
>        Subject: Re: [Humanist] 32.306: toward a theory of the corpus
> Thanks for responding, Bill. I would say that my response about brain vs.
> mind was an expectation following your claim about your own work -- this
> seems to be an expectation embedded within it. We're either discussing some
> kind of disembodied, possibly fictional "mind" or we're mapping something
> measurable in the brain. I appreciate that you chased that rabbit down the
> hole some ways, and that you see the depth of the hole, but it's not my job
> to fill he gaps in your work. I'm not interested. I'm just observing one.
> In my opinion, this is really only a matter of how you phrase your claims.

Thanks for following the rabbit, Jim. First a question: Do you always reject
claims about the mind unless they are backed up with neural evidence? If so, you
pretty much have to reject a large swath of work, current as a well as past.

> Here it is:
> "Recent corpus techniques ask literary analysts to bracket the
> interpretation of meaning so that we may trace the motions of mind."
> What are the "motions of mind"? Does that phrase actually mean anything? Is
> that what we're really describing? I don't think you're mapping the mind,
> but if you want to make that claim, you'll have to do more work.

One of the examples in the working paper comes from Andrew Piper's
computational work on conversion narratives (pp. 23-26). He uses Augustine's
Confessions as his primary text. He conducts a statistical analysis of the text
and discovers that each of the 13 books is located in a specific region of a
high dimensional space, which he projects onto two dimensions. I reproduce his
Figure 2 in the paper (on p. 23). Piper points out that the books seem to occupy
two distinct regions in that space, books 1-10 in one region and 11-13 in
another region. That distinction is quite evident in the graph. Since he's
numbered the points in his graph according to the books they represent, I
connect those points together into a path (pp. 25 & 26). I'm claiming that is
a path through mental space, through the mind. When we read Confessions our
attention is following that path.

When I talk of motions of mind, that's the kind of thing I mean, attention
along a path. Those two regions are regions in the mind. What else could the be?

Sure, they're regions in Augustine's text. But where did that text come
from? The words didn't just crawl out of a dictionary and line up to
constitute the Confessions. August wrote the text; the words came from his mind.
Therefore Piper's graph depicts regions in Augustine's mind (and the minds
of readers of the text). What alternatives do we have?

What I did mention in the working paper is that Piper then took that two region
analysis and asked whether or not that is typical of conversion narratives. So
he went looking for it and found it in 17 other texts.

> Now whenever someone talks this way,
> "Mapping the pathways of the mind -- Michael Gavin uses vector semantics to
> examine a passage from Paradise Lost. After arguing that a word-space model
> is, after all, a model of the mind, I suggest that vector semantics could
> be used to map paths through the mind."
> Sounds like we need to be doing work in conjunction with neuroscience to
> make these claims: "A model of the mind" and "map paths through the mind."
> Now you do define mind here:
> "Assuming that we can think of the mind as, in some aspect, a high-
> dimensional network of verbal meanings,"
> But I'm wondering why we should accept that assumption? I'm willing to
> consider it, but it seems too important to just be dropped there and left.
> This is philosophy of mind. Or is it Chomsky? I don't know. I think a
> rationale for accepting this definition is in order.

Note first of all that I don't claim this to be all of the mind, only 'some
aspect.' Are you saying that you don't think the mind contains a network of
verbal meanings? If verbal meanings aren't in the mind, where are they? The
high-dimensional part may be a bit strange, but do you seriously mean to doubt
that verbal meanings are in the mind? If verbal meanings aren't in the mind,
then where are they? Where is language?

Sure, I suppose you can say these meanings aren't in the mind, they're in
the text. But, as I've pointed out, that gets you nothing. The text is just a
string of symbols. Those symbols didn't assemble themselves. Someone put them
on the page and others read them. It's only in the minds of these people that
those symbols have any meaning at all, it's only in those minds that those
marks actually function as symbols.

Let me say a bit about the high-dimensional aspect. Back in the 19th century
physicists began using high-dimensional spaces to represent ‘chunks' of
matter. Each molecule would be represented by a separate dimension in the space
(actually 6, but that's a technicality we needn't worry about). Given that
even a single drop of water contains billions and billions of molecules, these
are spaces of very high dimensionality indeed. With that kind of mathematical
description a single point in the space represents the whole physical system.
And you can describe changes in state as paths through this high dimensional
space. Of course, they didn't actually measure the position and momentum of
each and every molecule in the system. Rather, they used this formalism to
describe things they could measure, like pressure and temperature. And this
space would have a region where the system as in a solid state, another region
where it was in a liquid state, another region where it was a gas and still a
fourth region where it was a plasma. Physicists were particularly interested in
what happened as the system moved from one of these regions to another (from
liquid to gas, or vice versa, etc.).

So, several decades ago some neuroscientists used the same mathematical
formalism to characterize brain activity, the late Walter Freeman among them. So
now each neuron is assigned a dimension in this space and the state of that
neuron becomes a position on that dimension. Each point in this space thus
represents the state of the whole brain at that moment. And a path through that
space represents changes in the state of the whole brain over time. Thus a path
in this space is not going to represent some path through the brain in any
simple and direct way. It's more abstract than that, but quite real.
Freeman's program consisted of 1) observations, often of the olfactory cortex
in rats, 2) computer simulation, and 3) mathematical analysis of 1 and 2.

When I talk about the mind as "a high-dimensional network of verbal meanings,'
that's the kind of thing I have in mind. Now, we can construct the high-
dimensional verbal space directly from a text or a corpus of texts. But how do
we relate this kind of space to the kind of neural state space Freeman and
others work with? That's a very tricky question, and one I've given a bit of
thought to. But we don't need to do that to get going. As I've said, we can
work with these high-dimensional word spaces right now, various investigators
have been doing it in one way or another for several decades.

> The issue isn't the
> validity of the work you're doing, which I think is important, but what we
> really learn from it. I'm not buying that it produces a mental map, and you
> haven't given me any reason to. Might it be better to say that you're
> providing descriptions of patterns of how the brain processes language?

But as you point out, I don't actually produce neural evidence, do I? So
what's the point of making that kind of claim? If I made it, would you accept
it, or just reframe your criticism to fit that claim?

> I also appreciate your background, and I think it's great that you were at
> Johns Hopkins in 1966. But, Structuralism continues to be taught in almost
> every literary theory class. Yes, I would agree that it continues
> piecemeal, but I don't think it's reduced to binary oppositions. But that's
> all off my main point, which is still that claims like these are painfully
> reductive descriptions of literary criticism:
> "Computational critics have an opportunity to map the human mind that is
> qualitatively different from what interpretive critics accomplish *by
> uncovering meanings 'hidden' in literary texts."*
> "Conventional literary criticism talks a lot about the text, but has no
> coherent conception of it. That is because it is focused on meaning and
> meaning doesn't exist in the marks on pages, the physical text."
> ". . . standard literary criticism oriented around the close reading of
> meanings hidden in texts." Yes, Structuralism's rejection of the idea of
> meaning "hidden in texts" does indeed continue to the present in a lot of
> literary criticism.
> "The intellectual world of literary study need not revolve around meaning.
> There are other ways to think about language and mind."
> There's no question that you define the field of literary criticism as a
> search for hidden meanings in texts, and I think given your background you
> should know better.

Well, from my point of view that the meaning be conceived specifically as
'hidden' isn't so important. Some critics talk that way, some don't. But
they're looking for meaning. And if they're not looking for meaning in the
text, where are they looking? In society, in social systems, semiotic systems?
How do literary critics have access to those things unless they somehow pass
through literary texts? Do we want to introduce a standard distinction between
meaning and reference?

> But I also think your last point is the key. I would
> hope most literary critics would agree, but I'm sure some would disagree.
> But, either way, I completely agree: there are other ways to think about
> language and mind. Other people should carry out that work, some literary
> critics will pick it up, but that doesn't detract from the work of literary
> criticism.
> Jim R

Bill Benzon





