3.408 scanners and digitized images, cont. (213)

Willard McCarty (MCCARTY@VM.EPAS.UTORONTO.CA)
Tue, 29 Aug 89 20:06:35 EDT


Humanist Discussion Group, Vol. 3, No. 408. Tuesday, 29 Aug 1989.


(1) Date: Mon, 28 Aug 89 23:38 EDT (21 lines)
From: GUEST4@YUSol
Subject: RE: 3.403 optical scanners (77)

(2) Date: Tue, 29 Aug 89 00:56 EDT (25 lines)
From: GUEST4@YUSol
Subject: RE: 3.405 digitizing pictures, cont. (85)

(3) Date: Tuesday, 29 August 1989 0059-EST (39 lines)
From: TREAT@PENNDRLS (Jay Treat, Religious Studies, Penn)
Subject: Why would anyone want to digitize manuscripts?

(4) Date: Tue, 29 Aug 89 02:45:00 EDT (24 lines)
From: Espen Ore <espeno@navf-edb-h.uib.uninett>
Subject: Digitizing photos

(5) Date: Tue, 29 Aug 89 10:32:58 CDT (8 lines)
From: Richard Goerwitz <goer@sophist.uchicago.edu>
Subject: frustrated?

(6) Date: Tue, 29 Aug 89 13:50:30 -0400 (56 lines)
From: choueka@thunder.bellcore.com (Yaacov Choueka)
Subject: Digitized images of text and a hypertext for the Talmud

(1) --------------------------------------------------------------------
Date: Mon, 28 Aug 89 23:38 EDT
From: GUEST4@YUSol
Subject: RE: 3.403 optical scanners (77)

In reply to Robin Cover ("It may be asking too much of industry to support
the humanities outright, but is it asking too much to ask for a FLEXIBLE,
generalized solution to optical character recognition?")
may I venture an opinion from the sidelines:

Yes, it is asking too much to expect a solution to descend ex machina
from the heavens, "industry", or even HUMANIST, without expending nay of
one's own elbow grease.

It may be asking too much of a certain kind of humanities colleague to
admit music as part of the common intellectual enterprise. But if the kind
of solution our colleague is seeking can be expected to emerge from anywhere,
it will be from the research on the design of an optical MUSIC recognition
system now under way, at McGill here in Canada, and elsewhere. Ever try
singing any of that pointed Hebrew, my friend?


(2) --------------------------------------------------------------33----
Date: Tue, 29 Aug 89 00:56 EDT
From: GUEST4@YUSol
Subject: RE: 3.405 digitizing pictures, cont. (85)

Now is obviously the time for all good humanists to write to their
congressperson, and/or their friendly neighborhood Japanese businessman. To
wit:

Dear Military-Industrial Complex:

Never mind if digital photography is still prohibitively expensive and unlikely
to yield sufficiently high resolution for most purposes anyway; never mind that
most of us aren't ever likely to need digital pictures of codices, and that
some aren't even too sure how a plain (or infrared) photograph differs from a
digitized image.

Just support us by making this technology available for us all to play with --
absolutely free, on Bitnet, of course. We'll comb the foothills to find
someone somewhere who is able to think up a use for it, however arcane. And
just think of the savings in airfare to Cairo or Benediktbeuern, not to mention
wear and tear on rare book librarians, when that priceless crumb of new
textological information finally does emerge on HUMANIST, for all our fellow
technophiles to peruse!

..."Then where is the problem, morally or ethically?" Where indeed.
(3) --------------------------------------------------------------42----
Date: Tuesday, 29 August 1989 0059-EST
From: TREAT@PENNDRLS (Jay Treat, Religious Studies, Penn)
Subject: Why would anyone want to digitize manuscripts?

It seems clear that if you want to publish one image of a manuscript on
the page opposite its transcription, a well-done photograph (printed at
exactly the same scale as the original, if you please) is vastly
preferable to a computer image digitized at an inferior resolution of
300 or 400 dots per linear inch. But there are definite advantages to
digitization...

For example, it is easy to count the dots that make up a digitized
image, in case you're interested in computing the average width of
various letter-forms or comparing the patterns of papyrus fibres. A
digitized image is also very easy to "cut up" in creative ways that
would be more difficult to accomplish with photographs.

We've started to play with this second approach here at Penn. If Bob
Kraft were here he could explain it much better than I. But since he is
in Cairo explaining it to fellow papyrologists, I'll try to present some
highlights of a case in point.

We digitized the photographs of two small papyrus fragments, both of
which an editor had identified as belonging to the same Greek
manuscript. We digitized both fragments at 300 dpi (dots per inch).
Then we digitally "cut out" images of all the alphas in both fragments
(alpha was the most common letter in these fragments) and "pasted" them
into the same screen in order to make comparisons. Using ordinary
graphics software, we could freely move each of these alphas (each about
an inch wide, displayed at 72 dpi) and superimpose it over the remaining
alphas. This procedure allowed us to compare the dimensions and the
curvature of the pen-strokes. The alphas on each fragment resembled one
another but differed noticeably from the alphas on the other fragment.
(A comparison of the only other common letter produced similar results.)
In a short time we were able to examine significant data to evaluate
the claim that both fragments were penned by the same hand. And we
didn't use up a grain of silver.

Regards, Jay Treat
(4) --------------------------------------------------------------31----
Date: Tue, 29 Aug 89 02:45:00 EDT
From: Espen Ore <espeno@navf-edb-h.uib.uninett>
Subject: Digitizing photos

The Norwegian Computing Centre for the Humaities and the Norwegian
Secretariate for Registration of Photos are collaborating on a
pilot project for a database containing digitized photos AND
their reference data.

We use Mac IIs with color screens (Apple's) and the Apple scanner.
One preliminary finding is the rather obvious one that the
resolution required depends upon the medium one uses for presenting
the images. For an Apple monitor with 72 dpi resolution, scanning
at 75 dpi is more than adequate as long as the monitor's capacity
for showing shades of gray is utilized. (E.g. television has a
rather poor resolution, but the enormous amount of possible
colors a given pixel may have makes it possible to present very
much information in one screen.) For printing we have used
LaserWriter II NTs, and since this is a strictly monochrome
output device it is important to scan at the printers full
resolution (300 dpi), and to use a set of halftone patterns
suitable for the separate photos.

Espen S. Ore
(5) --------------------------------------------------------------12----
Date: Tue, 29 Aug 89 10:32:58 CDT
From: Richard Goerwitz <goer@sophist.uchicago.edu>
Subject: frustrated?

Your note bespeaks frustration with optical character recognition
systems. Do you see any solutions on the horizon?


(6) --------------------------------------------------------------60----
Date: Tue, 29 Aug 89 13:50:30 -0400
From: choueka@thunder.bellcore.com (Yaacov Choueka)
Subject: Digitized images of text and a hypertext for the Talmud

I would like to present yet another argument in strong support
of Ian Lancashire position for distributing digitized images of
texts together with their electronic version, at least in
certain cases and for some specific texts.

I am now working with a graduate student at Bar-Ilan University
(Ramat-Gan, Israel) on developing
a hypertext system (on a SUN workstation)
for the Talmud, THE text par excellence that really
needs such a medium. We do have the Talmud on electronic media
as part of the Global Jewish Database. We decided a long time ago
however that besides displaying the text in "computer fonts"
for searching, browsing, linking, and other hypertext functions,
we are going also to give the user the option of looking at
an image of the Talmud page he is interested in, rather than
at its computer-generated counterpart. A printed page of the Talmud
has indeed a very typical layout and graphical format,
immediately recognizable by anyone who ever saw it even once, and
the same basic format is kept in the myriads of different
editions through which the Talmud went since it was first printed more
than 400 hundred years ago. The basic talmudic text accupies a rather
small rectangle (sometimes with a left or right "leg") and it is
litterally surrouned by 10 to 20 different commentaries in
typically different and characteristic fonts. A talmudic scholar
would be much more comfortable looking at such a page
(rather than at its computer-generated counterpart)
since he is mentally tuned to its graphical layout and fonts.
Of course there will be automatic links between the "Ascii" pages
and the digitized images, so that clicking on a button will
enable him to switch from one form to another immediately.

The same type of solution will be adopted for additional
hypertext systems that we are planning for other Rabbinical
texts (e.g. Maimonides' Code, and similar codes of
Jewish Law from the 13-16 centuries) that share the same basic
format.

By the way, there are about 5400 pages in the Talmud.

I am not a classicist, but I am sure that there are some classic
texts (Aristotle? Shakespeare?) with classical editions printed with
a lot of annotations and commentaries, whose layout is familiar
to the scholar studying these texts, and it is not far-fetched to
speculate that the scholar may be more comfortable with the
digitized image of the page (obviously, as mentioned before,
the computer-generated page should also be available for
clicking on words, etc.). Needless to say, if the text is put on the
computer as a direct transcription from a handwritten manuscript, then
it is simple intellectual honesty (specially since the technology is
already here) to put a digitized image of the manuscript in the
database, and to let the user decide for himself whether he agrees
with the given interpretation.