Humanist Discussion Group

Humanist Archives: March 25, 2021, 7:50 a.m. Humanist 34.300 - looking closely at smart quotes

				
              Humanist Discussion Group, Vol. 34, No. 300.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2021-03-24 06:59:09+00:00
        From: Jan Rybicki <jkrybicki@gmail.com>
        Subject: ODP: [Humanist] 34.297: looking closely at smart quotes

Unsurprisingly, David's old itch is also mine. Things would be so easy if
"quote" always meant just "quote", preferably with another sign for "unquote":
this would make dialogue recognition (in English) a piece of cake, and we would
not have to use all that machine learning that still doesn't do a good job. I
really think the stylometric mafia should try to influence the global government
to take steps in that direction. Provided both bodies actually exist...

Jan Rybicki

-----Wiadomość oryginalna-----
Od: Humanist <humanist@dhhumanist.org>
Wysłano: środa, 24 marca 2021 07:29
Do: jkrybicki@gmail.com
Temat: [Humanist] 34.297: looking closely at smart quotes

                  Humanist Discussion Group, Vol. 34, No. 297.
        Department of Digital Humanities, University of Cologne
                                Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2021-03-23 22:31:28+00:00
        From: David Hoover <david.hoover@nyu.edu>
        Subject: Re: [Humanist] 34.295: looking closely at smart quotes

Henry Schaffer's post scratched an old itch of mine. For me, as an inhabitant
(or at least a neighbor) of literary studies, the identity of the apostrophe and
the single (ASCII) quotation mark has really annoying consequences in doing
computational analysis, and the  problem Henry points out of "smart" quotes
exacerbates it further. Still worse, students working on Mac's have a whole
different set of potential difficulties from those working on PC's because
saving a text as "plain" text may not produce the same results cross-platform.
Add Unicode, shake, and get a stiff drink.

Because a surprising proportion of canonical writers use a good deal of dialect,
the apostrophe can be tedious to correct. My own "solution" is a Python program
that temporarily replaces various classes of single ASCII quotes with different
characters to make checking them easier. Anyone interested can try it out at
https://wp.nyu.edu/exceltextanalysis/python_tools/

David Hoover
--
            David L. Hoover, Professor of English, NYU
         212-998-8832       244 Greene Street, Room 409
               http://wp.nyu.edu/davidlhoover

"They had the Nos. of the rain bow and the Power of the air all workit out with
counting which is how they got boats in the air and picters on the wind.
Counting clevverness is what it wer."
-- Russell Hoban, Riddley Walker



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php