Humanist Discussion Group, Vol. 34, No. 300. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2021-03-24 06:59:09+00:00 From: Jan Rybicki <jkrybicki@gmail.com> Subject: ODP: [Humanist] 34.297: looking closely at smart quotes Unsurprisingly, David's old itch is also mine. Things would be so easy if "quote" always meant just "quote", preferably with another sign for "unquote": this would make dialogue recognition (in English) a piece of cake, and we would not have to use all that machine learning that still doesn't do a good job. I really think the stylometric mafia should try to influence the global government to take steps in that direction. Provided both bodies actually exist... Jan Rybicki -----Wiadomość oryginalna----- Od: Humanist <humanist@dhhumanist.org> Wysłano: środa, 24 marca 2021 07:29 Do: jkrybicki@gmail.com Temat: [Humanist] 34.297: looking closely at smart quotes Humanist Discussion Group, Vol. 34, No. 297. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2021-03-23 22:31:28+00:00 From: David Hoover <david.hoover@nyu.edu> Subject: Re: [Humanist] 34.295: looking closely at smart quotes Henry Schaffer's post scratched an old itch of mine. For me, as an inhabitant (or at least a neighbor) of literary studies, the identity of the apostrophe and the single (ASCII) quotation mark has really annoying consequences in doing computational analysis, and the problem Henry points out of "smart" quotes exacerbates it further. Still worse, students working on Mac's have a whole different set of potential difficulties from those working on PC's because saving a text as "plain" text may not produce the same results cross-platform. Add Unicode, shake, and get a stiff drink. Because a surprising proportion of canonical writers use a good deal of dialect, the apostrophe can be tedious to correct. My own "solution" is a Python program that temporarily replaces various classes of single ASCII quotes with different characters to make checking them easier. Anyone interested can try it out at https://wp.nyu.edu/exceltextanalysis/python_tools/ David Hoover -- David L. Hoover, Professor of English, NYU 212-998-8832 244 Greene Street, Room 409 http://wp.nyu.edu/davidlhoover "They had the Nos. of the rain bow and the Power of the air all workit out with counting which is how they got boats in the air and picters on the wind. Counting clevverness is what it wer." -- Russell Hoban, Riddley Walker _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php