12.0004 text for analysis; content-analysis software

Humanist Discussion Group
Fri, 8 May 1998

Humanist Discussion Group, Vol. 12, No. 4.
Centre for Computing in the Humanities, King's College London

[1] From: Ron Zweig <ron@rambam.tau.ac.il> (17)
Subject: text for analysis

[2] From: Ken Litkowski <ken@clres.com> (51)
Subject: Re: 11.0728 text-analysis

Date: Fri, 08 May 1998
From: Ron Zweig <ron@rambam.tau.ac.il>
Subject: text for analysis


I can think of other categories of text that were designed to convey
meaning to limited or defined audiences, and might therefore confound
"textual analysis":

(i) telegrams. A genre that has died, but fills the archives. Brevity
and confidentiality were more important than comprehensibility.

(ii) notes taken (e.g., lecture notes) for one's own use

Category (i) is interesting, because it is entirely probable that
text analysis skills have been developed outside of academia to
monitor correspondence that is important for reasons of security or
financial regulation (tax and foreign currency control).

Is there anyone working in the field of text analysis that has access
to, memory of or working relations with those murky organizations of
the state that have huge budgets to spend on developing analytical
tools for message understanding?

Ron Zweig
Humanities Computing Project
Tel Aviv University

Date: Tue, 05 May 1998
From: Ken Litkowski <ken@clres.com>
Subject: Re: 11.0728 text-analysis

As the purveyor of what I think is a fairly decent content analysis
package, about to be extended to easily handle multiperson dialogs,
automatically separating the speech of each speaker, I have over the
past several months considered issues like you are raising.

What my program picks up very well is stylistic differences that capture
genre quite well. Going beyond these surface aspects, the program then
picks up quite subtle gradations of meaning. When I recently analyzed
an 8-person discussion (that looked like a bunch of college students), I
was amazed to have interpretations leap off the page: who was being
bossy, who emotional, who analytical, the kinds of terms and semantics
they were using in this, swings into use of, say, abstract terms or
cusswords, etc.

The results from my program stemmed from a reasonably decent category
system. This in turn (and my main point) stems from being able to
create category systems based on syntactic, semantic, stylistic features
associated with the lexical items. With such features, you can, for
example, create categories that cover only a middle ground in a
hierarchy (neither too general nor too specific). So, I think it is
possible to do the kind of analysis which you dispute. Send me a file
of love letters and I'll run it through.

Ken Litkowski                         TEL.: 301-926-5904
CL Research                           EMAIL: ken@clres.com
20239 Lea Pond Place                    
Gaithersburg, MD 20879-1270 USA       Home Page: http://www.clres.com

