15.103 accuracy rates for proofreading

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: Tue Jun 19 2001 - 13:53:20 EDT

Next message: by way of Willard McCarty: "15.104 new on WWW: Webcast of ACH/ALLC, online anthropology"

Previous message: by way of Willard McCarty: "15.102 jobs, studentships"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

               Humanist Discussion Group, Vol. 15, No. 103.
       Centre for Computing in the Humanities, King's College London
               <http://www.princeton.edu/~mccarty/humanist/>
              <http://www.kcl.ac.uk/humanities/cch/humanist/>

         Date: Tue, 19 Jun 2001 18:39:11 +0100
         From: "Tim Reuter" <T.Reuter@soton.ac.uk>
         Subject: Re: 15.093 accuracy rates for proof-reading

I suspect that the real problem lies with the unit being counted. Error
rates can often look impressive until you realise that the error is not
per word but per character - a typical printed page has about 1700-2500
characters on it, so even an error rate of .005 would imply about 10
errors per page. I'd be sceptical about a claimed error rate of 0.005% (=
one error in 10 pages?) because I doubt whether human proof-reading could
verify it. Double keying followed by automated byte-for-byte comparison of
the two versions and conscientious correction ought in theory to produce
virtually error-free results: if accuracy is say 99% for each version,
only 1 character in 10000 in the original will be miskeyed in both
versions, and if the miskeying is completely random the likelihood that in
such cases the error will be identical and so not flagged by automated
comparison will be, with a normal size character set, about 1:1000, giving
an overall 1 in 1000000 chance of an error going undetected (to which of
course one would have to add the chances of a detected error's going
uncorrected and of fresh errors being introduced at the correction stage
in order to estimate the overall error rate).

But miskeying is not random: it is determined by keyboard layout, and also
by leaps of the eye and the influence of familiar words and letter
sequences in the keyboarder's own language on those in the text being
keyed, which mean that the chances of the same error being made at the
same point (and so going undetected) are a good deal higher than the
guesstimates above.

Tim Reuter

Next message: by way of Willard McCarty: "15.104 new on WWW: Webcast of ACH/ALLC, online anthropology"
Previous message: by way of Willard McCarty: "15.102 jobs, studentships"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Tue Jun 19 2001 - 14:06:33 EDT