17.427 gender-testing

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Thu Dec 04 2003 - 04:07:44 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 17, No. 427.
           Centre for Computing in the Humanities, King's College London
                       www.kcl.ac.uk/humanities/cch/humanist/
                            www.princeton.edu/humanist/
                         Submit to: humanist@princeton.edu

             Date: Thu, 04 Dec 2003 09:03:03 +0000
             From: "Prof. Shlomo Argamon" <argamon@iit.edu>
             Subject: RE: 17.423 gender-testing

    Hello, all.

    I must say it's rather gratifying that our study is receiving so much
    (mostly positive) attention. I would be very interested also, Malcolm, to
    read your _Poetics_ article - could you send me a copy?

    In our author-gender study, as in other authorship studies, there are a
    number of factors that can complicate matters considerably. One, which Dr.
    Hayward alluded to, is the issue of dialogue, whose lexicogrammatical
    properties differ from those of narrative text; this can produce a
    confounding effect similar to that of genre (about which more below). For
    example, in preparation for a piece on our work for NPR radio, they gave us
    a couple of "anonymous" short documents to analyze. One of them had a very
    high percentage of dialogue (it was a man and woman discussing their
    relationship). Initial results showed the piece as slightly female. When
    we used instead a model that ignored pronouns (a simple corrective for the
    dialogue), the piece came up as strongly male, and indeed it was an excerpt
    from Hemingway's "Hills Like White Elephants". Another issue that Dr.
    Hayward raised is the issue of the length of the passages - the mean length
    of passages in our study was about 30,000 words, much longer than the texts
    that he studied. Clearly, the longer the text, the more feasible it is to
    tease out slight statistical differences in lexicogrammar (if such exist).
    Something we are starting to look at in our lab, actually, is how to
    effectively deal (automatically) with stylistic issues in short documents.

    Regarding the on-line implementation(s) of our system mentioned by Ms.
    Morrison, I should make a few comments as well. First, the system that is
    available on line uses a simplified model, so its accuracy is not expected
    to be as great as that of our research system. More significantly, the
    model that it uses is based on fiction writing, so genre differences will be
    a confounding effect. In fact, we found in our study a strong correlation
    between the maleness (femaleness) of textual features and their
    nonfiction-ness (fiction-ness). Thus I would have expected male-skewed
    results from a test of academic writing even by women. The link between
    such results and "how people perceive gender in text" is not at all clear,
    and more precise results relating the lexicogrammar of a text with how it is
    perceived will be needed.

    I am very interested in discussing this topic more - the links between
    lexicogrammar, gender, genre, and how text is perceived are very relevant, I
    believe, to developing an "information age criticism", bridging the gap
    between the "Two Cultures".

              -Shlomo-

    Shlomo Argamon, Associate Professor
    Department of Computer Science, Illinois Institute of Technology
    Chicago, IL 60616
    Phone: 312-567-5289 Fax: 312-567-5067



    This archive was generated by hypermail 2b30 : Thu Dec 04 2003 - 04:13:51 EST