Scholarly Communication and the Digital Library:
Problems and Issues (1)
Properties of a Digital Library
ABSTRACT: This paper considers a range of
definitions for a digital library from the
perspective of scholarly communication and the properties of a traditional research
library. It then explores some of the problems and issues involved in creating and
maintaining a digital library, depending on the characteristics one wants it to have.
The paper stresses the need to consider the requirements of scholarship and research
as we build the digital libraries of the future.
As evidenced by the creation of this journal, there is much interest today in
digital libraries. We see many research and development projects, a plethora of
international conferences, high activity in the computer science, human/computer
interaction, library and information science and other research and development
communities, and a great deal of development activity on the Internet. An advanced
Alta Vista search conducted in early July, 1996 on "digital library" OR "digital
libraries"
retrieved about 20,000 entries. Six months later, the same search retrieved 30,000
hits, a significant proportion of which were relevant to the subject.
In spite of all this activity, it is not at all clear what one means by
the term
"digital library." The term is rarely defined, or even characterized. It has been
applied
to an extraordinary range of applications -- from digital collaboratories to
collections
of electronic journals, software agents that support inquiry-based education (3),
collections of email and similar objects (4), electronic versions
of a public library (5),
personal information collections (6), and the entire Internet (7), among others. It is not
easy to see what these have in common except for their digitization. This property
(which Daniel Atkins calls digital coherence) allows all the objects in a
digital library --
sounds, images, texts, and everything else -- to be treated in essentially the same
way,
for the first time in the history of libraries.
If we know what "digital" means, what is meant by the library half of the term?
In what ways are digital libraries indeed libraries in some meaningful sense? How
are
they not? More to the point -- what values, properties, and characteristics of the
traditional library do we want to retain as we build the digital libraries of the
future?
Our new digital libraries will clearly have much added functionality -- capabilities
that
have never been present in traditional libraries. At the same time, however, we are
in
danger of losing important properties of the traditional library. In our efforts to
build
new systems we rarely ask what aspects of traditional libraries are important to
retain.
This paper will address these issues in the context of scholarship and research.
It will consider the nature of a digital library in terms of a range of definitions
that are
based on properties of a traditional research library. It will also explore some of
the
problems and issues involved in creating and maintaining a digital library, depending
on the properties one wants it to have. There is insufficient space here to consider
these issues in detail. Here I will provide only an overview that ignores questions
of
cost, implementation, and technical detail.
Let me emphasize that I am not claiming that the traditional library is a
panacea; this is not a Luddite cry to take sledgehammers to the machines or to
construct digital libraries in the images of our venerable local institutions. But
what I
do assert is that the traditional library encompasses many values and
properties that
are considered important for scholarly communication. I am calling for a clear
understanding and consideration of these characteristics as we design and develop the
digital libraries of the future.
The term "digital library" is simply the most recent in a long series of names
for
a concept that was written about long before the development of the first computer.
The
idea of a "computerized library" that would supplement, add functionality, and even
replace traditional libraries was invented first by H.G. Wells and other authors, who
caught the imagination of millions with speculative writings about "world brains" and
similar fanciful devices.
There is general agreement that much of the early actual application of
computers to information retrieval was stimulated by the prominent scientist
Vannevar Bush, who wrote about the "memex," a mechanical device based on
microfilm technology that anticipated the ideas of both hypertext and personal
information retrieval systems (8). The first real-world
applications of computers to
libraries began in the early 1950s with IBM and punched card applications to library
technical services operations, and with the development of the MARC (machine-readable
cataloging) standard for digitizing and communicating library catalog
information. In 1965, J. C. R. Licklider coined the phrase "library of the future"
to
refer to his vision of a fully computer-based library (9), and
ten years later, F.W.
Lancaster (10) wrote of the soon-to-come "paperless library."
About the same time Ted
Nelson (11) invented and named hypertext and hyperspace. He
also analyzed some of
the problems to be identified later in this paper in some detail, but never built an
operational system. Many other terms have been coined to refer to the concept of a
digitized library, including "electronic library," "virtual library," "library
without
walls," "bionic library," and others (12).
The relatively recent use of the term "digital library" can be traced to the
Digital Libraries Initiative funded by the National Science Foundation, the Advanced
Research Projects Agency, and the National Aeronautics and Space Administration in
the United States. In 1994 these agencies granted 24.4 million dollars to six U.S.
universities for digital library research, impelled by the sudden explosive growth of
the
Internet and the development of graphical Web browsers (13).
The term was quickly
adopted by computer scientists, librarians, and others. Thus, while the term
"digital
library" is relatively new, work in bringing digitized information resources to
libraries
(or thinking of digitized information resources as libraries) has a history spanning
several decades.
There is little discussion and less agreement in the literature about what
constitutes a digital library. One may insist on a relatively narrow definition --
based
explicitly on the properties of the traditional print library -- or consider a much
broader continuum of possibilities. The most inclusive view takes a digital library
to
be, as its starting point, essentially what the Internet is today. But from this
extreme
perspective it can be seen that the metaphor of the traditional library fails in
several
respects.
Properties of a Digital Library
Table 1 describes essential properties of a digital library ranging from quite
traditional to extremely broad views. A digital library contains digital
representations
of the objects found in it. Most understandings of "digital library" probably also
assume that it will be accessible via the Internet, though not necessarily to
everyone.
But the idea of digitization is perhaps the only characteristic of a digital
library on
which there is universal agreement.
Beyond the idea of digitization, a digital library is a library. Or is
it? What
makes a library a library? In what senses do we really want the digital libraries we
are
building to be libraries? What are the essential features of a "library"?
The first column of Table 1 summarizes essential characteristics of a traditional
research library. The second and third columns consider successively broader views
of
these properties from the point of view of what constitutes (or should constitute) a
digital
library. For example, a digital library may be organized and represented in the form
of
object surrogates created by human specialists (indexed, classified, cataloged) or it
may
be entirely unorganized, with no "added value" whatever, using free text searching of
the objects themselves -- rather than object surrogates -- to gain access to the
objects
in the library.
Of course, the digital libraries we are building will have properties not present
in the traditional research library, with many of these innovations yet to be
invented.
The digital coherence of the objects, the near elimination of distance or physical
location as an important consideration, and the existing computer and
communications infrastructure (and that yet to be built) will give rise to a myriad
new possibilities for enriching and redefining what we think of as a library. But
there
may be a tradeoff. We may be asked to give up some important properties to gain new
ones. Table 1 summarizes what I consider to be the essential features of a digital
library, viewed from the perspective of scholarship and research.
The traditional research library has a physical location, embodied in
its physical
building. Most of the objects in the library are information resources of
some kind. The
works are also selected. Criteria for the selection process are defined,
and these criteria
typically include measures of quality. The objects (information sources)
in it are
organized -- classified, catalogued, and indexed by human beings, in what
are called
value-added processes (14). Authority control is a key
feature, in which names of authors,
variants of works (editions), and subject headings or descriptors are all controlled.
The
concept of authorship and ownership are extremely important in a
traditional research
library, in which various forms of an author's name are brought together in a name
authority file. Surrogates of the objects in the library -- called index
records, or in
digital library terminology, metadata -- are created for purposes of
representing the
value added by catalogers and indexers. Data are recorded in dozens of specific
fields
and subfields of these records, and are "finely searchable." That is, highly
specific
searches can be conducted on particular combinations of fields or subfields of the
index records. Retrieved records are linked to the objects themselves, which can
then
be obtained and used (15).
The treatment of authorship and ownership in the traditional research library
reflects the importance of these ideas in traditional scholarly communication, in
which
scholars and scientists cite in reference lists the authors and works from whom they
have borrowed ideas, words, or facts, thus paying intellectual debts and
acknowledging
original authorship. Ownership of intellectual property is central to publishing and
scholarship. Plagiarism -- stealing the words of others without attribution -- is
considered unethical in scholarly writing and science. Formal legal rights of
ownership
are also defined by national and international copyright law.
The objects in a traditional research library have certain properties as well.
First, they are fixed -- they do not normally change, or if they do, various
editions are
identified and considered to be different from one another. Objects are also
permanent
-- they do not normally disappear from a collection. Finally, a variety of
services to
users are offered by librarians who work in the traditional library. These
include
assistance with searching for information resources, reference and research services,
readers advisory services, and others. A traditional research library typically
offers
only limited access to materials and services; access to certain services
may be restricted
to certain classes of potential users.
Finally, use of basic services in many traditional research libraries is free for
defined user populations. Some of these libraries are large, tax-supported research
institutions. My own university library, for example, offers free access to basic
services
to all the citizens of the state of Indiana.
One can take a narrow or a broad view of digital libraries according to these
properties. It seems clear that among all of the properties listed, physical
location is
the least likely to survive in a digital library. Resources in future digital
libraries will
be more likely to be distributed than not. But all of the other properties listed in
Table 1 are also in jeopardy in at least some of the digital libraries being built or
conceptualized. Writers have taken a variety of positions as they contemplate what a
digital library should be.
Miksa and Doty (16) take a traditional perspective, defining
a digital library as a
collection of information sources in a place (if not a physical place, then at least
a
logical one). They argue that a broader definition would lead to something different
from what is normally understood to be a library. Graham (17)
stresses the support of
research as he describes the "digital research library," which looks much like the
research library of today in many of its essential features (see Table 1). Atkinson
(18)
calls for a "control zone" in which the traditional research library can continue to
function in a digital environment.
Further along the continuum, Wellman, et al see a digital library of the future
in which software agents use principles of artificial intelligence (AI) to perform
"monitoring, management, and allocation of services and resources" (19). Indeed, they
define a digital library as a "community of information agents" that would
retain most
of the properties of the traditional library listed in Table 1, but would perform
them
using intelligent software rather than human beings. However, the extent to which
techniques of AI can actually perform the functions envisioned by Wellman, et al, is
not at all clear. Most of what the authors describe is presently no more than
speculation.
Having evolved his position significantly in two years, Miksa views the
traditional library as evolving into a "personal space library" that excludes many of
the
characteristics and values of the traditional library and which is configured for a
single
individual or small group (20).
At the far extreme is the Internet itself as it exists today, that has
essentially
none of the properties of the traditional library listed in Table 1. (See,
for example,
Wallace (21).) The Internet is anarchic and individualistic.
It is not a collection of
information resources selected on the basis of their quality, organized by subject,
etc.
The vast majority of objects on the Internet have no surrogates -- or metadata --
associated with them. Fine-grained searching -- searching limited to specific fields
such as subject, editor, year of publication, version number, language, author, etc.,
is
not possible. In general only the objects themselves are searchable, in a full-text,
free-text mode that is presently extremely crude and inexact. However, some believe
that
the near future holds highly significant improvements in searching, through concept
searching and vocabulary switching (22). If this prediction is
accurate, perhaps many
kinds of metadata -- but not all -- can be eliminated without great loss in future
digital libraries.
There are real problems with the concept of "author" on the Internet. The
concept of "control" is almost entirely absent. Many of the objects on the Internet
will
one day vanish without a trace. Those that remain are in a constant state of change.
There are very few services and few of these are offered by human beings, as opposed
to computer software (the Internet Public
Library
is a welcome exception). The metaphor of the traditional
library simply does not apply to the Internet; most of the values and properties of
the
traditional research library are absent. Of course there are certain spots on the
Internet that do have some of these properties. These are much more like traditional
libraries, if one can manage to find and enjoys access privileges to them.
The metaphor of the traditional research library is powerful, useful, and
compelling. Further, there are good reasons for the properties enumerated in Table
1.
There is insufficient space here to go into these reasons in detail. However, it
seems
clear to me that science, scholarship, learning and teaching could not have evolved
as
we know them without the existence of the great and small "traditional" libraries of
the
world. Scholarship and learning imply the need to check and evaluate sources, to
conduct careful, fine-grained comprehensive searches, to select, to be able to think
about evidence critically, to more or less freely examine resources, to consider
provenance. How well will the reader in future digital libraries be able to carry
out
these functions? Interestingly, as measured by references in published papers,
electronic publications of all kinds have thus far made very little impact on
scholarship
and science (23), including electronic journals (24). This may be due in part to the difficulty
of conducting scholarly work on today's Internet; for example, the problem of access
to
electronic journals is not trivial (25).
Readers in traditional research libraries are also able to consult with
librarians
as they attempt to accomplish their work. Who will scholars and researchers consult
in future digital libraries? An extremely strong case can be made for including
librarians in the digital library (26).
Finally, many public, tax-supported research libraries are open to the public and
are free for basic services. Use is not limited to those wealthy enough to afford
the
equipment, telecommunications charges, and fees for services such as access to the
collection and permission to use materials. Access to the objects in the traditional
library is recognized as a public good and is supported with public tax monies. What
kinds of access will the digital libraries of the future provide? What classes of
users
will be permitted free access to objects and services?
Table 2 summarizes the problems and issues that I have identified. Ignored are
the many managerial questions that might be raised, as well as how solutions can be
paid for. The traditional library attempts to deal with these problems and issues in
a
number of ways. Those who are building digital libraries must ask themselves whether
these issues should be considered. Perhaps the most thorough study of these
questions
has been conducted by Ross Atkinson, who calls for librarians to lay claim to the
"control zone" -- demarcating a single, distributed digital library created by the
academic library community and based on principles of the traditional research
library (27).
An alternative to establishing a control zone is to take a broad view, and build
digital libraries in which some or all of the properties of the traditional library
have
largely disappeared. Questions then immediately arise concerning science,
scholarship,
teaching, and learning. Will students take what they find on the Internet as
"truth?"
This is already happening today. What kind of scholars and researchers will such
students become? How can they (or anyone, for that matter) evaluate what they find
on the Internet? The problems of quality, integrity and authorship are legion. What
is the source of the information that one finds? Who actually wrote it? How old is
it?
How accurate? Is it really what it claims to be? What "edition" is it? What is its
authority? Its provenance? What will happen to the concept of authorship and the
notion of fixed, permanent documents? To the concept of evaluation of sources?
How will these changes affect scholarship and research? These are crucial social
questions that are extremely important to contemplate.
Consider a personal example. I recently conducted an Internet search for
information on South Korea, using Alta Vista's advanced search mode. Among the
materials retrieved was an entry in the CIA Factbook (published by the
Central
Intelligence Agency, an agency of the U.S. government), the home pages of private
individuals, pages that had no clear source, commercial firms, digitized newspaper
articles, and several links whose referents had already disappeared. When I
conducted
this search, could it be said that I was searching the contents of a library? To
what
extent could I trust the accuracy of what I read? Were the documents purporting to
be
from the CIA Factbook or published by the Associated Press actually from
these
sources? If so, how current was the information in them? Or, were they forgeries or
slightly modified originals with small, subtle but significant changes? There is
simply
no easy way to tell. To what extent can information from private individuals be
considered "factual?" What are the highest quality (most accurate, complete,
error-free, current, etc.)
sources of information on the Internet about South Korea's history
and culture? Of course, these same questions can be asked of print materials in
traditional libraries, but the problems are greatly exacerbated on the Internet.
Only by
stretching the metaphor of the library far beyond its traditional sense can the
Internet
be construed as a library.
Nearly two decades ago former U.S. Librarian of Congress Daniel Boorstin
observed that Gresham's Law was at work in the information field; that information
was driving knowledge out of circulation (28). In a recent
study published by Reuters
Business Information, empirical evidence was found to support this thesis (29). One in
four managers in the UK, US, Australia, Hong Kong, and Singapore admitted to
suffering ill effects -- including tension, stress, illness, and the breakdown of
personal
relationships, among others -- as a result of trying to deal with the amount of
information they now handle, and fully half expect the problem to get worse with the
continued growth and development of the Internet.
In a recent piece that is reminiscent of some of Daniel Boorstin's ideas, Mary
Biggs lamented the disappearance of books and serious reading from our discussions of
the virtual library (30). Why are we building digital
libraries, anyway? What is our broad
social purpose? What properties of our digital libraries are implied by these
purposes?
Will our digital libraries be part of the problem or part of the solution?
Perhaps the best of all possible worlds would be a broad, inclusive digital
"library" filled with a multitude of interesting and informative objects and
software
agents of all kinds -- as well as a large amount of material that is worthless to
almost
everyone. Such a place would be built from the bottom up, and would consist of
whatever materials and objects and libraries anyone wanted to build (and could afford
to maintain). It would be an evolved version of what the Internet is like today.
But I would argue that one important aspect of such a place must be special
spaces, digital libraries that have the properties of a traditional research library,
a
control zone, or perhaps more realistically, a collection of control zones, Here
would
be found high quality material, selected by specialists. True intellectual access
would
be provided in the form of fine-grained search tools and object surrogates
constructed
using the value-added processes of indexing, cataloging and classification. Such
digital
libraries would concern themselves with the currency, accuracy, and integrity of the
information sources found within them, and would address the other concerns
identified here as well. They would offer actual services to their user populations.
Where these cannot be accomplished by computer software, they would be performed
by human beings -- the librarians of the digital library. Finally, I hope that we
will
have digital libraries that are supported by tax monies and that will offer free
basic
services to defined constituent groups, not just to those who can afford to pay for
them.
1. An earlier version of this paper was delivered at at
KOLISS DL '96: International
Conference on Digital Libraries and Information Services for the 21st Century,
September
10-13, 1996, Seoul, Korea.
2. Email address:
harter@indiana.edu
3. Atkins, Daniel E., William P. Birmingham, Edmund H. Durfee,
Eric J. Glover, Tracy Mullen, Elke A. Rundensteiner, Elliot Soloway, José M.
Vidal, Raven Wallace, and Michael P. Wellman. 1996. Toward
inquiry-based education through interacting software agents.
4. Winograd, Terry. 1995. Digital vs. libraries: Bridging
the two cultures. SIGIR '95:
Proceedings of the 18th Annual International ACM SIGIR Conference on Research and
Development
in Information Retrieval 18:2.
5. 1997. Internet public library: Same metaphors, new
service. American Libraries: 56-59.
6. Miksa, Francis. 1996. The Cultural Legacy of the "Modern
Library" for the Future.
Journal of Education for Library and Information Science 37:100-119.
7. Wallace, Jonathan. "The Internet is a library."
Sex,
Laws, and Cyberspace Bulletin 1.
8. Bush, Vannevar. As we may think.
Atlantic Monthly 176 (1945): 101-108.
9. Licklider, J. C. R. Libraries of the Future.
Cambridge, Mass.: M.I.T. Press, 1965.
10. Lancaster, F. Wilfrid. Toward paperless information
systems. New York: Academic Press,
1978.
11. Nelson, Theodor H. Computer Lib. Chicago:
Nelson, 1974.
12. Drabenstott, Karen. Analytical Review of the
Library of the Future. Council on Library
Resources; Washington, D.C.
13. Pool, Robert. "Turning an info-glut into a library."
Science 266 (1994): 20-22.
14. Taylor, Robert S. Value-added processes in
information systems. Norwood, NJ: Ablex, 1986.
15. Although Alta Vista and a few other search engines
permit field searching, most objects
on the Internet have only a few identifiable fields. There is nothing remotely
approaching
the MARC communications format in common use.
16. Miksa, Francis L. and Philip Doty. 1994. Intellectual Realities and the
Digital Library
Proceedings of the First Annual Conference on the Theory and Practice of Digital
Libraries. June
19-21, 1994, College Station, Texas.
17. Graham, Peter S. 1995. The digital research
library: Tasks and Commitments. Digital Libraries
'95: The Second Annual Conference on the Theory and Practice of Digital
Libraries, June 11-13,
1995, Austin, Texas, USA.
18. Atkinson, Ross. 1996. Library functions, scholarly
communication, and the foundation
of the digital library: Laying claim to the control zone. Library Quarterly
66:239-65.
19. Wellman, Michael P., Edmund H. Durfee and William P.
Birmingham. The
digital
library as community of information agents. A position statement, to appear in
IEEE Expert, June,
1996.
20. Miksa, 1996. "The cultural legacy of the 'modern
library' for the future."
21. Wallace, 1996. "The internet is a library."
22. Schatz, Bruce R. 1997. Information retrieval in digital
libraries: Bringing search to the
net. Science 275:327-33.
23. Harter, Stephen P. and Hak Joon Kim. 1996. Electronic
journals and scholarly
communication: A citation and reference study. Proceedings of the ASIS
Midyear Meeting (San
Diego, CA: May, 1996). pp. 299-315.
24. Harter, Stephen P. 1996. The Impact of Electronic Journals
on Scholarly
Communication: A Citation Analysis. Public-Access Computer Systems
Review 7(5).
25. Harter, Stephen P. and Hak Joon Kim. 1996. Accessing
electronic journals and other
e-publications: An empirical study. College & Research Libraries
57:440-56.
26. Arnold, Kenneth. 1995. The electronic
librarian is a verb/ The electronic library is not a
sentence.
Miksa, 1996. "The cultural legacy of the 'modern library' for the future."
27. Atkinson, Ross. 1996. "Library functions, scholarly
communication, and the foundations of
the digital library: Laying claim to the control zone."
28. Boorstin, Daniel. Gresham's Law: Knowledge or
Information. Remarks at the White House
Conference on Library and Information Services. Washington, D.C., November 19,
1979.
29. Reuters Business Information. 1996. New
independent research reveals cost of the
information revolution.
30. Biggs, Mary. 1995. "Virtual libraries & actual
readers." The Seventh Nasser Sharify Lecture (Sunday, May 14, 1995, Pratt Manhattan
Center). Pratt School of Information and Library Science.
Stephen P. Harter, Professor (2)
School of Library and Information Science
Indiana University
Bloomington, Indiana 47405
Contents
Abstract
NARROW VIEW (based on
traditional library)
BROADER VIEW (a middle
position between the
extremes)
BROADEST VIEW
(loosely based on current Internet)
objects are located in a
physical place
objects are located in a logical
place (may be distributed)
objects are not located in a
physical or logical place
objects are information
resources
most of the objects are
information resources
objects can be anything at
all
objects are selected on the
basis of quality
some of the objects are
selected on the basis of quality
no quality control; no entry
barriers
objects are organized
no organization
objects are subjected to
authority control
some aspects of authority
control are present
no authority control
surrogates of objects are
created
surrogates are created for some
objects
no surrogates of objects are
created
surrogates are "finely
searchable"
surrogates and objects are
finely searchable
only objects are searchable
authorship is an important
concept
concept of author is weakened
no concept of author
objects are fixed (do not
change)
objects change in a
standardized way
objects are fluid (can
change and mutate at any
time)
objects are permanent (do
not disappear)
disappearance of objects is
controlled
objects are transient (can
disappear at any time)
access to objects is limited to
specific classes of users
access to some objects is
limited to specific classes of
users
access to everything by
everyone
services such as reference
assistance are offered
the only services are those
performed by computer
software (AI)
human specialists (called
librarians, etc.) can be found
there are no librarians
there exist well-defined user
groups
some classes of objects have
associated user groups
there are no defined user
groups (or, alternatively,
infinitely many of them)
use of library is free for
specified user groups
use of library requires payment
for some services and/or user
groups
use of library requires
payment
Table 2. Questions and Issues Related to
Information Resources (IRs) in the
Digital Library