18.282 new projects: text-mining and visualization; preservation

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Wed, 13 Oct 2004 06:32:57 +0100

               Humanist Discussion Group, Vol. 18, No. 282.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

   [1] From: John Unsworth <unsworth_at_uiuc.edu> (57)
         Subject: web-based text-mining and visualization

   [2] From: John Unsworth <unsworth_at_uiuc.edu> (97)
Subject: NDIIPP at UIUC
         Subject: NDIIPP at UIUC

         Date: Wed, 13 Oct 2004 06:21:56 +0100
         From: John Unsworth <unsworth_at_uiuc.edu>
         Subject: web-based text-mining and visualization

The Andrew W. Mellon Foundation has granted nearly $600,000 over two years
to a multi-institutional project directed by John Unsworth, Dean of the
Graduate School of Library and Information Science at the University of
Illinois, Urbana-Champaign. The project builds on the D2K (Data to
Knowledge) software developed by Michael Welge's Automated Learning Group
at the National Center for Supercomputing Applications, and it will include
partners in humanities research computing at the University of Georgia, the
University of Maryland, and the University of Virginia. The project will
produce software for discovering, visualizing, and exploring significant
patterns across large collections of full-text humanities resources in
existing digital libraries and collections at Tufts University, the
University of Illinois, Indiana University, the University of Michigan, the
University of North Carolina, the University of Virginia, and other

"In search-and-retrieval," Unsworth says, "we pose specific queries and get
back answers to those queries; by contrast, the goal of data-mining is to
produce new knowledge by exposing unanticipated patterns. Over the last
decade, many millions of dollars have been invested in creating digital
library collections: the software tools we'll produce in this project will
make those collections significantly more useful for research and teaching."

Stephen Ramsay, the University of Georgia's representative on the project
and a member of the UGA English Department, agrees: "literary criticism and
data mining share an important common ground: both are concerned with the
isolation of patterns in data. Students of literature are often trying to
detect patterns of change in the language or structure of literary works.
Sometimes, this search for pattern is ordered toward the demonstration of
some interpretive insight, but this order is just as often reversed--we
notice patterns in texts and those patterns inspire interpretive insight."

Matthew Kirschenbaum, faculty member in the University of Maryland's
English department and Fellow at the Maryland Institute for Technology in
the Humanities (MITH), says that "information visualization will be the
essential scholarly genre of the 21st century. It is already commonplace in
astronomy, biology, chemistry, economics, engineering, environmental
sciences and geology, geography, meteorology, physics, and mathematics. The
basic intellectual and imaginative leap for information visualization in
the humanities will be the leap from documentary to algorithmic forms of
evidence. At the same time, we must understand the 'iconology' of these
visual displays, their roots in long-standing traditions of image-making,
cognitive design, and knowledge representation."

Martha Nell Smith, Director of MITH, observes that "the cross-institutional
collaboration in this initiative will help ensure that we build tools that
are widely usable, that are standards-based, and that will advance the
production and preservation of digital scholarship in the humanities, in
all its diversity." Bernard Frischer, Director of the University of
Virginia's Institute for Advanced Technology in the Humanities (IATH)
points out that "digital scholarship in the humanities requires extensive
multimedia collections, and it seeks to explore and document the complex
relationships among items in such collections. This, in turn, requires a
close collaboration between humanists and computing specialists." Tom
Horton, of the University of Virginia's Computer Science Department, will
oversee a distributed software development process for this project. He
notes that "developing successful software tools to work effectively in
such complex situations is always a challenge, so we'll follow principles
of user-centered software design in order to create data mining and
visualization tools that will give scholars what they need to be effective,
efficient and creative as they work with digital library materials."

The Mellon Foundation provided a $56,000 planning grant for this project,
in 2003.

         Date: Wed, 13 Oct 2004 06:22:56 +0100
         From: John Unsworth <unsworth_at_uiuc.edu>
         Subject: NDIIPP at UIUC

U. of I. to play lead role in project to preserve digital information
News Bureau, University of Illinois, Urbana-Champaign
Andrea Lynn, Humanities Editor
217-333-2177; andreal_at_uiuc.edu


CHAMPAIGN, Ill. — The University of Illinois at Urbana-Champaign has been
chosen as one of the lead institutions in a massive new Library of Congress
project to save at-risk digital materials nationwide.

The U. of I. Library and the U. of I. Graduate School of Library and
Information Science will receive nearly $3 million over three years for
their role in the Library of Congress preservation project, called the
National Digital Information Infrastructure and Preservation Program.

“Together with the Library of Congress, we’ll address a problem that grows
more pressing every day: How do we collect, manage, preserve, and make
useful the enormous amount of digital information our culture is now
producing?” said John Unsworth, the dean of the U. of I. Graduate School of
Library and Information Science, and co-principal investigator of Illinois’
grant. Beth Sandore, associate university librarian for information
technology planning and policy at Illinois, is the other co-principal

Sandore sees the grant as a unique opportunity and challenge.

“The public has entrusted libraries, museums and archives with the
stewardship of collections and resources so that they can be used by future
generations,” Sandore said. “Collecting, selecting and preserving digital
information requires approaches and resources that are substantively
different from those we have used traditionally.

“This partnership presents a unique opportunity for us to work with a
network of institutions, including our partners, other NDIIPP grantees and
the Library of Congress, to develop both the methods and the technologies
that will help the library community better understand how to preserve and
make accessible significant digital resources for future generations.”

According to Sandore, the project also provides an opportunity for
information professionals with traditional library backgrounds and those
with digital library expertise to work together to address these challenges.

Illinois’ nationwide partners are the OCLC Online Computer Library Center
in Dublin, Ohio; Tufts University’s Perseus Project; the Michigan State
University Library; and an alliance of state libraries from Arizona,
Connecticut, Illinois, North Carolina and Wisconsin. Partners on the
Illinois campus include WILL-AM, -FM and -TV (public radio and television
stations), the Division of Management Information and the National Center
for Supercomputing Applications (NCSA).

Illinois’ project more specifically will develop criteria for selecting
digital material for capture and preservation, with OCLC taking the lead to
build software to help automate the process. Illinois, OCLC and NCSA will
jointly provide storage for the digital content collected in the project in
databases called “repositories” and will test real-world problems that are
encountered in the process of digital archiving.

Illinois also will explore ways for libraries and repositories to share and
preserve digital information existing in a wide variety of formats
including Web-based government publications, historical documents and
photos, sound and video recordings, Web sites and other varied digital
resources that will be of historical interest to future generations.

Because most digitally created materials have no physical version, these
“so-called born-digital materials are at a much greater risk of either
being lost and no longer available as historical resources, or of being
altered, preventing future researchers from studying them in their original
form,” a Library of Congress news release said, adding that “Millions of
digital materials, such as Web sites mounted in the early days of the
Internet, are already lost either completely or in their original versions.”

Illinois, along with the other partners in NDIIPP, seeks to identify
methods and technologies that will help avoid losing information that is of
significant historical value.

The project is expected to involve a great many players and have a wide
ripple effect, both within and outside the state.

“In the best tradition of land-grant schools,” Unsworth said, “this project
puts research and teaching to work in the service of the state and the nation.”

According to Unsworth, the infrastructure that will be funded by this grant
at Illinois will constitute “a unique environment for the comparative
testing and published evaluation of digital library software and
techniques. That environment also will be used for faculty research and for
teaching students in a new advanced degree program in digital libraries.”

Unsworth became dean of Illinois’ library school a year ago after serving
as the founding director of the Institute for Advanced Technology in the
Humanities at the University of Virginia. He is a frequent speaker on
topics related to digital scholarship, digital libraries and scholarly
publishing. Sandore has served as associate university librarian for
information technology planning and policy since 2001. Her professional
experience and research focus on developing and evaluating digital
libraries of cultural heritage information.

Illinois is one of eight institutions leading projects under this round of
NDIIPP funding. The others are the University of California, the University
of California at Santa Barbara, Educational Broadcasting Corporation
(Thirteen/WNET New York), Emory University, University of Maryland,
University of Michigan and North Carolina State University Libraries.
Illinois’ grant is the third highest of the participating institutions.

Laura E. Campbell, who is leading the NDIIPP initiative for the Library of
Congress, said that “These formal partnerships mark the beginning of a new
phase of this program to raise awareness of the need for digital
preservation and to take steps to capture and preserve at-risk digital
content that is vital to our nation’s history.”

The Library of Congress called for applications a year ago. All
applications were subjected to a peer-review process administered by the
National Endowment for the Humanities. Librarian of Congress James H.
Billington made the final selection.

The Library of Congress is the largest library in the world. The U. of I.
Library is the largest public university library in the world, and the U.
of I. Graduate School of Library and Information Science is consistently
rated among the best in the world.
