Humanist Discussion Group, Vol. 17, No. 436.
Centre for Computing in the Humanities, King's College London
www.kcl.ac.uk/humanities/cch/humanist/
www.princeton.edu/humanist/
Submit to: humanist@princeton.edu
Date: Sun, 07 Dec 2003 09:16:15 +0000
From: Willard McCarty <willard.mccarty@kcl.ac.uk>
Subject: data not doomed
Humanists will be interested in a recent newspaper article, Simson
Garfinkel, "The Myth of Doomed Data", MIT Technology Review, The Wall
Street Journal, 5 December 2003. This article has been reprinted from
Technology Review and can be read online, at
http://www.technologyreview.com/articles/wo_garfinkel120303.asp?p=1.
Citing the famous Domesday Project, Garfinkel quotes an article in The
Observer to the effect that although data assembled to report the state of
the U.K. in 1986 was now unreadable, the original Domesday Book, compiled
in 1086, had no such difficulty. He notes that, "This ironic death of
Domesday has been taken as a rallying cry for an increasingly vocal group
of computer scientists and archivists who argue that we are in danger of
losing our cultural heritage -- or at least that part of our cultural
heritage that we have been foolish enough to commit to electronic storage
devices." He then comments, "There's just one problem with this reasoning:
it's wrong."
The Domesday data has been copied from the original videodiscs and now runs
on an emulator. "The real lesson of the Domesday Project", he argues, "is
that nonstandard file formats carry a huge hidden cost. Because
high-quality image and video compression hadn't been invented yet in 1986,
the BBC saved a tremendous amount of money by putting the Domesday Project
on a pair of videodiscs rather than stamping the data onto perhaps a
hundred CD-ROMs. But those savings must now be cast against the real cost
borne by those who must migrate the data into a modern format.
"Indeed, for every Domesday Project that has lost its data to proprietary
equipment and file formats, it is easy to point to another project for
which information created decades ago is still available. The Internet
"Request For Comment" (RFC) series, started back in the 1970s, is readable
on practically every computer on the planet today because the RFCs were
stored in plain ASCII text. Similarly, you can download images sent back
from the Voyager space probes 30 years ago and view them on your PC because
NASA stored those pictures as bitmaps -- pixel-by-pixel copies of the
images without any compression whatsoever. Some argue that it's impossible
to look into the future and determine which of today's formats will survive
and which will go the way of the VP 415. Poppycock! As a society we have a
very good understanding of what will make one file format endure while
another one is likely to perish. The key to survival is openness and
documentation." He cites PDF and JPEG as examples.
"What about the physical media itself? he asks. "Although there are many
examples of tapes and floppy disks being unreadable five or 10 years after
they are created, there are many counterexamples as well. Generally
speaking, people who make an effort to preserve digital documents have no
problem doing so.... "Electronic archivists do have a significant challenge
facing them: computer systems make it easy to put a tremendous amount of
information in a single place. If you aren't careful, it's easy to lose all
of this information at once. And today's computer systems are so
tremendously reliable that fewer and fewer users are properly backing up
their data; people just don't remember the bad old days when a computer
might fail at a moment's notice.
"But on the whole, I think that electronic records are far more stable,
more durable, and more likely to last than their paper equivalents. The
technical problems are largely solved.... What's needed now is a plan to
make long-term electronic archival services available to the masses."
Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
7848-2784 fax: -2980 || willard.mccarty@kcl.ac.uk
www.kcl.ac.uk/humanities/cch/wlm/
This archive was generated by hypermail 2b30 : Sun Dec 07 2003 - 04:56:41 EST