9.774 citing Web documents

Humanist (mccarty@phoenix.Princeton.EDU)
Mon, 6 May 1996 19:14:32 -0400 (EDT)

Humanist Discussion Group, Vol. 9, No. 774.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at http://www.princeton.edu/~mccarty/humanist/

[1] From: Robert A Amsler <amsler@bellcore.com> (40)
Subject: How to cite the Web

[2] From: John Unsworth <jmu2m@virginia.edu> (11)
Subject: Re: 9.767 citing URLs

[3] From: "Peter Graham, RUL" <psgraham@gandalf.rutgers.edu> (24)
Subject: Re: 9.767 citing URLs

Date: Mon, 6 May 1996 17:38:33 -0400
From: Robert A Amsler <amsler@bellcore.com>
Subject: How to cite the Web

The basic philosophical problem of citing the Web is that it is a fundamentally
transitory reference. What good will Web cites (not to be confused with Web
sites!) do in ten years when someone is trying to track down the references
in a paper? They will be little more than "personal correspondence" given the
turnover in Web pages and their contents.

This is a serious problem for serious literature use and probably something
more familiar to the Middle Ages than the modern age. We're burning old Web
pages to light the candles of new computer storage on a daily basis. Compared
to the longevity of acid-based paper, Web pages deteriorate in the blink
of an eye. There won't be any traces left of them at all. Worse, still, there
could be something at the original address that changed in content.

There thus seem to be two problems with citing Web pages. (1) If we take the
purpose of citing things to be to allow others to track down the sources,
then Web citations are very time-limited. There may be various technological
patches to this IF the original documents were kept electronically linked,
but putting them on paper severs the ability to confirm the links are still
there. (2) The citation needs to supply the information necessary to track down
whomever CREATED the Web page, at least as much as the transitory pointer
to where the page is currently located. Indeed, giving the location is
akin to describing the works in a library by their positions on the shelves
(2nd floor, 3rd shelf near the fire extinguisher, top row, 4th book... well, it
WAS there yesterday.). Thus, if you can't find the Web page itself, you ought
to be able at least to find the organization that sponsored it or the people
who put the data there.

I think the problem that needs to be addressed is who keeps the Web page alive?
Long term...Where are Web pages being archived?

You'd think we would have learned something from the history of film, radio
and television. That sooner or later someone will want to find the first
products of the technology--want to track down the works produced in the early
years, before everything was saved for commercial reuse.

Unlike our era's mass-produced literature we're back to unique products.
We've separated mass-production and mass-distribution, such that now we can
have unique objects mass-distributed. But if the distribution process fails,
for example, because the unique object is erased, then simultaneously,
everywhere the pointers cease to be meaningful.

There is limited redundancy in the Web. Mirror sites ease congestion on access
(and an interesting citation problem is thereby posed--which site is the
"original"... Is citing both akin to listing every city which a publisher
notes in their frontpage for publication locations?). Perhaps what SHOULD
be on the Web is an archive of past Web pages? What will the historians
look at to reconstruct the history of the Web?

Date: Sun, 5 May 1996 16:16:05 -0400 (EDT)
From: John Unsworth <jmu2m@virginia.edu>
Subject: Re: 9.767 citing URLs

May I suggest that whatever one does with URLs in citation, putting them
inside angle brackets (e.g. <http://www.virginia.edu/> is a bad idea,
since those angle brackets indicate, to any sgml- or html-aware program,
that the contents of the brackets constitute a tag. An unrecognized
"tag" such as a URL inside angle brackets will either be ignored (and not
displayed) or rejected by almost any of these programs. Since the URL
string begins with one of a few predicatable (and probably not randomly
occuring) strings (http:// or ftp:// or gopher:// etc.), I suggest that
no additional representation is necessary.

John Unsworth
http://www.village.virginia.edu/~jmu2m/ jmu2m@virginia.edu

Date: Mon, 6 May 96 16:53:22 EDT
From: "Peter Graham, RUL" <psgraham@gandalf.rutgers.edu>
Subject: Re: 9.767 citing URLs

From: Peter Graham, Rutgers University Libraries
Andrew Burday says,
>Incidentally, I am doubtful about the ultimate utility of citations based
on URLs.<

He is quite correct in this sentence and in his analysis (comparing it to a
shelf location). The development we are awaiting of course is the URN, the
Uniform Resource Name, a first example of which has been created at CNRI by
William Arms and his colleagues and is known as a "handle". (An interim
suggestion the PURL, or Permanent URL, is being implemented by OCLC and will
have some use.)

The URN idea is that a name will be associated with a software object that
will always and only be identified with that object, no matter where it
temporarily resides (at one URL or another). The name will point to the
object in rather the way that our domain names point to specific network
addresses: the domain name server (DNS) resolves a name like
phoenix.princeton.edu into the appropriate numeric node and we don't need to

A related idea is the URC, or Uniform Resource Characteristics, which rather
link cataloging information provides attributes of an object allowing it to
be searched for.

There's more to be said on all this, and Andrew Burday has helpfully pointed
to the RFC addresses where it is discussed (and proposed). --pg

Peter Graham psgraham@gandalf.rutgers.edu Rutgers University Libraries
169 College Ave., New Brunswick, NJ 08903 (908)445-5908; fax (908)445-5888