17.436 digital preservation

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Sun Dec 07 2003 - 04:50:49 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 17, No. 436.
           Centre for Computing in the Humanities, King's College London
                       www.kcl.ac.uk/humanities/cch/humanist/
                            www.princeton.edu/humanist/
                         Submit to: humanist@princeton.edu

             Date: Sun, 07 Dec 2003 09:16:15 +0000
             From: Willard McCarty <willard.mccarty@kcl.ac.uk>
             Subject: data not doomed

    Humanists will be interested in a recent newspaper article, Simson
    Garfinkel, "The Myth of Doomed Data", MIT Technology Review, The Wall
    Street Journal, 5 December 2003. This article has been reprinted from
    Technology Review and can be read online, at
    http://www.technologyreview.com/articles/wo_garfinkel120303.asp?p=1.

    Citing the famous Domesday Project, Garfinkel quotes an article in The
    Observer to the effect that although data assembled to report the state of
    the U.K. in 1986 was now unreadable, the original Domesday Book, compiled
    in 1086, had no such difficulty. He notes that, "This ironic death of
    Domesday has been taken as a rallying cry for an increasingly vocal group
    of computer scientists and archivists who argue that we are in danger of
    losing our cultural heritage -- or at least that part of our cultural
    heritage that we have been foolish enough to commit to electronic storage
    devices." He then comments, "There's just one problem with this reasoning:
    it's wrong."

    The Domesday data has been copied from the original videodiscs and now runs
    on an emulator. "The real lesson of the Domesday Project", he argues, "is
    that nonstandard file formats carry a huge hidden cost. Because
    high-quality image and video compression hadn't been invented yet in 1986,
    the BBC saved a tremendous amount of money by putting the Domesday Project
    on a pair of videodiscs rather than stamping the data onto perhaps a
    hundred CD-ROMs. But those savings must now be cast against the real cost
    borne by those who must migrate the data into a modern format.

    "Indeed, for every Domesday Project that has lost its data to proprietary
    equipment and file formats, it is easy to point to another project for
    which information created decades ago is still available. The Internet
    "Request For Comment" (RFC) series, started back in the 1970s, is readable
    on practically every computer on the planet today because the RFCs were
    stored in plain ASCII text. Similarly, you can download images sent back
    from the Voyager space probes 30 years ago and view them on your PC because
    NASA stored those pictures as bitmaps -- pixel-by-pixel copies of the
    images without any compression whatsoever. Some argue that it's impossible
    to look into the future and determine which of today's formats will survive
    and which will go the way of the VP 415. Poppycock! As a society we have a
    very good understanding of what will make one file format endure while
    another one is likely to perish. The key to survival is openness and
    documentation." He cites PDF and JPEG as examples.

    "What about the physical media itself? he asks. "Although there are many
    examples of tapes and floppy disks being unreadable five or 10 years after
    they are created, there are many counterexamples as well. Generally
    speaking, people who make an effort to preserve digital documents have no
    problem doing so.... "Electronic archivists do have a significant challenge
    facing them: computer systems make it easy to put a tremendous amount of
    information in a single place. If you aren't careful, it's easy to lose all
    of this information at once. And today's computer systems are so
    tremendously reliable that fewer and fewer users are properly backing up
    their data; people just don't remember the bad old days when a computer
    might fail at a moment's notice.

    "But on the whole, I think that electronic records are far more stable,
    more durable, and more likely to last than their paper equivalents. The
    technical problems are largely solved.... What's needed now is a plan to
    make long-term electronic archival services available to the masses."

    Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
    Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
    7848-2784 fax: -2980 || willard.mccarty@kcl.ac.uk
    www.kcl.ac.uk/humanities/cch/wlm/



    This archive was generated by hypermail 2b30 : Sun Dec 07 2003 - 04:56:41 EST