3.403 optical scanners (77)

Mon, 28 Aug 89 19:01:44 EDT

Humanist Discussion Group, Vol. 3, No. 403. Monday, 28 Aug 1989.

(1) Date: 27 Aug 89 00:08:59 EST (36 lines)
From: James O'Donnell <JODONNEL@PENNSAS>
Subject: Scanning the future

(2) Date: Mon, 28 Aug 89 10:37:51 CST (22 lines)
From: "Robin C. Cover" <ZRCC1001@SMUVM1>

(1) --------------------------------------------------------------------
Date: 27 Aug 89 00:08:59 EST
From: James O'Donnell <JODONNEL@PENNSAS>
Subject: Scanning the future

Recent HUMANIST notes about Kurzweil developments suggest non-classicists may
be interested in something in the APA Newsletter for August 1989 (and
classicists will want to read it carefully, on the first page). Our standard
bibliographical tool, L'Annee Philologique (hefty annual volumes indexing more
or less everything a classicist might want to read [though not all s/he SHOULD
want to read]), is going to be made machine-readable. This project will be
many years in the making, with the first CD-ROM containing only the most
recent 13 years of bibliography not due out until 1993.

Of most interest and concern to me was this sentence: `The feasibility of
input by optical scanning was tested and rejected in a pilot project funded by
the David and Lucile Packard Foundation which demonstrated that scanners could
not accurately interpret a multilingual text printed in multiple European
typefaces like the APh.' This is gloomy news, for it means that they will be
doing double manual entry -- 1965 technology, really. Gloomier because I
assume that a Packard-funded pilot project would be pretty competent, pretty
alert to state-of-the-art possibilities, etc. If THEY say no, it means NO.

Do I understand, though, that the real problem is the MULTILINGUAL? I had
thought we were approaching the point where some standard reference works
(e.g., the Thesaurus Linguae Latinae) could be fed to a scanner and made
accessible at a price less than an ayatollah's ransom.

In fact, we must reach the point someday where it can be done for a cost more
or less congruent to what the (admittedly limited) clientele is willing to pay
for the result, or we risk having large bodies of important (but not VITAL)
material fail to make the transition from print to digitation, whereupon it
will eventually find the same fate that met those ancient texts that didn't
make it from papyrus roll onto the high-tech codex.

Short form of my query: how gloomy should we be?
(2) --------------------------------------------------------------25----
Date: Mon, 28 Aug 89 10:37:51 CST
From: "Robin C. Cover" <ZRCC1001@SMUVM1>

In light of conversations with Kurzweil technical support in
Cambridge, I must further qualify the "cautious optimism" of HUMANIST
posting 3.397 on the Kurzweil 5100. For some multilingual applications
(*perhaps* accented Greek), the software may indeed hold some promise;
that depends upon the ability of the user to overrule the software's
intelligence in assigning arbitrary ascii values to regular characters and
non-ascii characters [not clarified to me]. For pathological cases like
pointed Hebrew, the software is much more limited: mappings of special
characters [I would call "graphs"] to the encoded "four-character
sequence" are apparently limited to "a few," not hundreds or thousands of
instances. Having awakened from this dream "too good to be true," I set
out upon a renewed quest for intelligent, accurate, trainable OCR
software. It may be asking to much of industry to support the humanities
outright, but is it too much to ask for a FLEXIBLE, generalized solution
to optical character recognition?

Robin Cover