17.427 gender-testing

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Thu Dec 04 2003 - 04:07:44 EST

Next message: Humanist Discussion Group (by way of Willard McCarty

               Humanist Discussion Group, Vol. 17, No. 427.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist@princeton.edu

         Date: Thu, 04 Dec 2003 09:03:03 +0000
         From: "Prof. Shlomo Argamon" <argamon@iit.edu>
         Subject: RE: 17.423 gender-testing

Hello, all.

I must say it's rather gratifying that our study is receiving so much
(mostly positive) attention. I would be very interested also, Malcolm, to
read your _Poetics_ article - could you send me a copy?

In our author-gender study, as in other authorship studies, there are a
number of factors that can complicate matters considerably. One, which Dr.
Hayward alluded to, is the issue of dialogue, whose lexicogrammatical
properties differ from those of narrative text; this can produce a
confounding effect similar to that of genre (about which more below). For
example, in preparation for a piece on our work for NPR radio, they gave us
a couple of "anonymous" short documents to analyze. One of them had a very
high percentage of dialogue (it was a man and woman discussing their
relationship). Initial results showed the piece as slightly female. When
we used instead a model that ignored pronouns (a simple corrective for the
dialogue), the piece came up as strongly male, and indeed it was an excerpt
from Hemingway's "Hills Like White Elephants". Another issue that Dr.
Hayward raised is the issue of the length of the passages - the mean length
of passages in our study was about 30,000 words, much longer than the texts
that he studied. Clearly, the longer the text, the more feasible it is to
tease out slight statistical differences in lexicogrammar (if such exist).
Something we are starting to look at in our lab, actually, is how to
effectively deal (automatically) with stylistic issues in short documents.

Regarding the on-line implementation(s) of our system mentioned by Ms.
Morrison, I should make a few comments as well. First, the system that is
available on line uses a simplified model, so its accuracy is not expected
to be as great as that of our research system. More significantly, the
model that it uses is based on fiction writing, so genre differences will be
a confounding effect. In fact, we found in our study a strong correlation
between the maleness (femaleness) of textual features and their
nonfiction-ness (fiction-ness). Thus I would have expected male-skewed
results from a test of academic writing even by women. The link between
such results and "how people perceive gender in text" is not at all clear,
and more precise results relating the lexicogrammar of a text with how it is
perceived will be needed.

I am very interested in discussing this topic more - the links between
lexicogrammar, gender, genre, and how text is perceived are very relevant, I
believe, to developing an "information age criticism", bridging the gap
between the "Two Cultures".

-Shlomo-

Shlomo Argamon, Associate Professor
Department of Computer Science, Illinois Institute of Technology
Chicago, IL 60616
Phone: 312-567-5289 Fax: 312-567-5067

This archive was generated by hypermail 2b30 : Thu Dec 04 2003 - 04:13:51 EST