15.103 accuracy rates for proofreading

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: Tue Jun 19 2001 - 13:53:20 EDT

  • Next message: by way of Willard McCarty: "15.104 new on WWW: Webcast of ACH/ALLC, online anthropology"

                   Humanist Discussion Group, Vol. 15, No. 103.
           Centre for Computing in the Humanities, King's College London

             Date: Tue, 19 Jun 2001 18:39:11 +0100
             From: "Tim Reuter" <T.Reuter@soton.ac.uk>
             Subject: Re: 15.093 accuracy rates for proof-reading

    I suspect that the real problem lies with the unit being counted. Error
    rates can often look impressive until you realise that the error is not
    per word but per character - a typical printed page has about 1700-2500
    characters on it, so even an error rate of .005 would imply about 10
    errors per page. I'd be sceptical about a claimed error rate of 0.005% (=
    one error in 10 pages?) because I doubt whether human proof-reading could
    verify it. Double keying followed by automated byte-for-byte comparison of
    the two versions and conscientious correction ought in theory to produce
    virtually error-free results: if accuracy is say 99% for each version,
    only 1 character in 10000 in the original will be miskeyed in both
    versions, and if the miskeying is completely random the likelihood that in
    such cases the error will be identical and so not flagged by automated
    comparison will be, with a normal size character set, about 1:1000, giving
    an overall 1 in 1000000 chance of an error going undetected (to which of
    course one would have to add the chances of a detected error's going
    uncorrected and of fresh errors being introduced at the correction stage
    in order to estimate the overall error rate).

    But miskeying is not random: it is determined by keyboard layout, and also
    by leaps of the eye and the influence of familiar words and letter
    sequences in the keyboarder's own language on those in the text being
    keyed, which mean that the chances of the same error being made at the
    same point (and so going undetected) are a good deal higher than the
    guesstimates above.

    Tim Reuter

    This archive was generated by hypermail 2b30 : Tue Jun 19 2001 - 14:06:33 EDT