DIGITISATION: FLAT OR STRUCTURED?
list: lis-elib



28 Oct. 1996     
From Mr C A Rusbridge, Programme Director, Electronic Libraries Programme

At the Digitisation and Images concertation day last week, a chance remark
by Prof. David Brailsford set me thinking. At the start of his talk (about
Adobe Capture), he mentioned he was not involved in an eLib digitisation
project, but was involved with two others (OJF and PPT) using PDF in
electronic journal contexts... "but these are not really relevant here".

As he spoke, I began to wonder why it is we spend a lot of time in those
two projects trying to get hypertext structures into (or onto) PDF
versions of journal articles, but we seem content when we digitise back
issues to produce quite flat page representations. We generally work hard
to make them searchable, but we don't expect to embed links in the pages.

Clearly we could do this, either as image maps, or indeed as PDF links
(whatever they are called). And if we were to digitise back issues of
journals now becoming available electronically, especially if in PDF form
with links, I would have thought our audiences would not wish to see a
distinction, nor would they understand why links were available for issues
after a date but not before it.

I suspect part of this stems from our wish to deal with this material in a
bulk way; we don't want to get involved in processing the content to
identify a citation or reference or some other reason to link, as these
would put the costs up. It would certainly be good if there was a way to
handle this at low cost, and if so I would hope that digitisation projects
would at least ask themselves the question posed in the subject of this
email. Comments?

-- 
Chris Rusbridge

Programme Director, Electronic Libraries Programme
The Library, University of Warwick, Coventry CV4 7AL, UK
C.A.Rusbridge@Warwick.ac.uk



29 Oct. 1996 
From P.Sykes, Liverpool John Moores University

Chris raises an interesting question when he asks why we
do not do more in the way of adding hypertext links to
material we are retrospectively converting to electronic
form. In our "On Demand" project at Liverpool JMU we have
created online course materials in a group of humanities
modules. These combine copyright texts with material
written by our own lecturers. It would have been
extremely useful to enrich these course resources with
links - between copyright works, from lecturers'
materials to copyright works, and from copyright works to
lecturers' materials. It would have encouraged students
to use the materials in a more open-ended and imaginative
way. It would have enabled a kind of use which would not
have been possible with the original printed materials.

So why didn't we do this if it's such a good idea? Well,
partly because we were concerned to expedite the process
of digitisation, as Chris suggests but, more importantly,
because we felt it was important to respect the integrity
of the copyright texts with which we had been supplied.
By adding links which could not have been contemplated by
the original author you do, it could be argued, subtly
alter the meaning of the text. You may think you are only
adding value, but you could be adding meaning too - a
meaning not intended by the author or even contrary to
his wishes.

So the only links we added were "mechanical" links - back
from a copyright work to a general list, or from
references in a text to footnotes in the same text. This
may seem a bit over-scrupulous, but we felt that adopting
any other policy would have introduced yet another layer
of difficulty into our negotiations with publishers. It
would also have introduced an additional complication
into publishers' relations with their authors. We're not
the only ones who have a complicated life!

P. Sykes
P.SYKES@livjm.ac.uk



29 Oct. 1996   
From Jon Knight, Dept. Computer Studies, Loughborough University of Technology

LEAPSYKE wrote:
> So why didn't we do this if it's such a good idea? 

Now this discussion is just screaming out "Open Journal Project" in my
head!  If you used the distributed linkbase concept that those guys have
come up with you'd be able to overlay the original copyrighted works with
different, multiple sets of links.  The basic copyrighted document would
be the same as the original but students could opt to see the lecturer's
"spin" on the topics contain with in it.  You could get really flash and
let the students choose between one or more competing lecturers (maybe at
different institutions) linkbases so that they could see different points
of view on subjects.  And you might have a "standard" linkbase that
linked specific keywords or phrases to factual dictionary definitions.

That way the students can opt to read the copyrighted work as it was
originally written or with any one of a number of combinations of
additional sets of links added to it.  Also, as the links are held in the
external DLS, you aren't going to be adding lots of possibly shortlived
links straight into the copyrighted documents; the linkbase can be
regularly pumped through a linkchecker and deadlinks could be quietly
dropped (and maybe flagged to the lecturer responsible so that they could
locate replacements).

Anyway, just a thought.  The Open Journal Project Web pages are at
<http://journals.ecs.soton.ac.uk/> with more info on them.
Incidentally I'm not connected with the Open Journal Project other than
having the benefit of a trip to Southampton to talk to the guys working on
it and being impressed with the technology.  Joe Bob says check it out. 

Tatty bye,

Jim'll

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Jon "Jim'll" Knight, Researcher, 
Dept. Computer Studies, 
Loughborough University of Technology, 
Leics., ENGLAND.  LE11 3TU.
<jon@net.lut.ac.uk>



30 Oct. 1996 
From Lorcan Dempsey, UKOLN: the UK Office for Library & Info Networking

My first reaction on reading Chris's original message was the same as
Jon's -- OJP.

I was at a recent presentation of the OJP by Steve Hitchcock and thought
much that was said about the role of link management as
publisher-added-value was intriguing. At that particular meeting there was
some opposition from one or two members of the audience to the idea of a
publisher determining which should be the links, the links being seen as
part of the intellectual content of an article. It seems to me that this
is something which can be resolved within the particular practices and
framework of responsibilities of any particular 'publishing' venture. 

The discussion did prompt me to think that the facilities provided could
be used to provide some forms of commmentary or explication of a text -
which chimes with some of what Jon is saying below. 
A slightly strained example: a paper by J.M. Keynes might be put in a
short loan collection by a history lectureer or by an economics lecturer
- each could add a set of links which reflects their particular interests 
or the place of the paper in the course they are teaching.
Links could be added to resources which were sources for or influences on
the paper. Links could be added which point to areas which were influenced
by the paper. And so on.

Lorcan

----------------------  Lorcan Dempsey
UKOLN: the UK Office for Library & Info Networking
University of Bath, Bath BA2 7AY, UK
<lisld@ukoln.ac.uk>



29 Oct. 1996  
From Stuart Peters, University of Surrey

LEAPSYKE wrote:
>By adding links which could not have been contemplated by
>the original author you do, it could be argued, subtly
>alter the meaning of the text. You may think you are only
>adding value, but you could be adding meaning too - a
>meaning not intended by the author or even contrary to
>his wishes.

>So the only links we added were "mechanical" links - back
>from a copyright work to a general list, or from
>references in a text to footnotes in the same text.

These points highlight a very useful distinction between types of hyperlink
- and raise the point that to add contextual links to a work may alter the
author's original intent.

Further to this, links will be dynamic in the same way as texts are - as
texts grow older, so their meaning changes and the surrounding literature
alters their context.  Gulliver's Travels is a book often referred to in
this argument - it is rarely read today with the same political cynicism in
mind as when it was originally written.  Hyperlinks made in documents today
may not be the same links that would be added in years to come, or those
that would have been applied to documents written in the past.  Whilst
mechanical links will remain constant, contextual links will not.  Because
of this dynamic constraint, surely it must be an author's responsibility
alone to add contextual links?

Stuart
____________________________________________________________________________
                    SOCIOLOGICAL RESEARCH ONLINE
               Editorial and IT Officer: Stuart Peters
Department of Sociology         
University of Surrey            
Guildford, Surrey GU2 5XH       
United Kingdom                                    
Stuart.Peters@soc.surrey.ac.uk



29 Oct. 1996   
From Jon Knight, Dept. Computer Studies, Loughborough University of Technology

Stuart Peters wrote:
> Whilst
> mechanical links will remain constant, contextual links will not.  Because
> of this dynamic constraint, surely it must be an author's responsibility
> alone to add contextual links?

Not necessarily, if one thinks of the contextual links as annotations to
the document.  These annotations could be made by anyone to allow them to
provide their comments and thoughts on the document.  Public annotations
are something I really miss from the earlier days of the Web (they were in
the early NCSA X Mosaic releases but the architecture that they had in
place then wouldn't scale and so they've disabled everything but private 
in recent releases).

When you think about it, much of the academic literature is based on
annotations to existing works which are used to show the work that went
before your contribution to knowledge, except we call them "papers with
references". The difference in the traditional literature is that the
links are unidirectional, they go FROM the new work TO the old work, and
are quite disconnected from the old documents (modolo the citations
services available).  What we can do with electronic versions of existing
documents is make those links bidirectional so that you can go FROM an old
document TO new one(s).  Handy if I get referred to an old paper (say one
from 1995 in this game!) and want quickly to see what else has been based
on its concepts.  And a new, added value feature of the electronic library
over the paper one.

Oooh, I can feel a paper coming on... :-)

-
Jon "Jim'll" Knight, Researcher, 
Dept. Computer Studies, 
Loughborough University of Technology, 
Leics., ENGLAND.  LE11 3TU.
<jon@net.lut.ac.uk>



30 Oct. 96 
From Steve Hitchcock, Open Journal Project, University of Southampton

Stuart Peters wrote:
>Surely it must be an author's responsibility
>alone to add contextual links?

The legitimacy of contextual links has been challenged elsewhere as well as
on this list, and in the Open Journal project we are bound to acknowledge
these concerns, but this is too simplistic. The project is using a tool - a
link service - which potentially makes it easy to superimpose such links on
third-party authored works. The point about a link service, however, is that
it should be flexible enough to be used to produce useful links, not simply
indiscriminate links.

Jon Knight helpfully filled in some of the background. Ideally it will be
possible to use the link service to control the links that are created, also
the environment or type of documents to which links should be applied, the
documents on which the links are superimposed and who sees the links. It is
how all of these variables are applied that determines the value to the user.

Jon also pointed to some examples in which contextual links could be
beneficial. Since this thread is discussing historical materials, there are
some good examples in the hypertext literature of using links in a scholarly
context to bring new perspectives to a body of work. One of the best known
is Landow's Dickens Web. This was developed at Brown University in the USA
with an open hypertext system not dissimilar in principle to our link
service, but that work preceded the Web and so the distribution of these
documents and links was limited. There is no doubt, though, about the impact
of that work locally on the study of Dickens.

LEAPSYKE wrote:
>By adding links which could not have been contemplated by
>the original author you do, it could be argued, subtly
>alter the meaning of the text.

True, but it should not be disdained for this reason alone. The Web is not a
technology but a transforming culture. As far as authoring is concerned,
hypertext, and the ubiquity of the Web, are leading to what Landow, a
professor of English, calls the 'de-centering' of the text, that is, giving
readers 'unprecedented control' and 'overthrowing the author's usual
preeminence'. This is clearly a long-term and complex area, but there is a
case for researchers to explore this potential responsibly.

Steve Hitchcock                                                          
Open Journal Project
Multimedia Research Group, Department of Electronics and Computer Science  
University of Southampton SO17 1BJ,  UK                     
sh94r@ecs.soton.ac.uk



30 Oct. 1996 
From Tony Barry, Head, Center for Networked Access to Scholarly Information,
Australian National University Library

Jon Knight wrote:
> What we can do with electronic versions of existing
> documents is make those links bidirectional so that you can go FROM an old
> document TO new one(s).  Handy if I get referred to an old paper (say one
> from 1995 in this game!) and want to quickly see what else has been based
> on its concepts.  And a new, added value feature of the electronic library
> over the paper one.

Ted Nelson's original concept of hypertext had bidirectional links and
these have been implemented in the Hyper-G system.  For material published
on a Hyper-G server other authors can add links into arbitrary locations
indide the document.  Conversely it is always possible to link backwards
from new links coming into your documents.  It's far more powerful that
http/html - although it can be read by Web browsers.

Tony

______________________________________________
Head, Center for Networked Access to Scholarly Information,
Australian National University Library, A.C.T. 0200, AUSTRALIA.
Tony.Barry@anu.edu.au



31 Oct. 1996  
From David Brailsford, Dunford Professor of Computer Science,
University of Nottingham

Chris (and other respondents),

Thanks for the e-mail (and the replies). Yes -- as Southampton's academic 
partner in the OJF project I'm delighted to find that, in Loghborough at 
least, they've seen clearly the virtues of separable hyperstructure
and separate linkbases.

	There are two sorts of linking here. The OJF type of hyperlink is
good (as Jon Knight points out) for cross-document links to other corpora
where a particular set of cross-links might put a particular "spin" or
commentary on some topic or other. The *intra* doct. links tend to be more
specific (citation to actual reference; "see Figure 2" to Figure 2 itself
and so on)
 
	The things we've been working on here at Nottingham enable us
to do both of these things on PDF files (even those that have been 
acquired by OCR e.g. with Acrobat Capture). Admittedly the technology 
needs some further development but we'd be happy to do some test examples
if people have suitable material.

	In the longer run doing all of this properly relies on yet more research
(that we've been doing outside of eLib) in inferring document structure 
"bottom up" from PDF, i.e. detecting headings, tables, paras, captions,
footnotes automatically and then producing an SGML tagged doct. Once one
has inferred some context then the detection of objects to be linked 
becomes very much easier. This is not of "industrial strength" yet
but if you have a spare bob or two in the eLib kitty Chris, I'm happy to 
submit an extra proposal :-)

David B.
 

---------------------------------------------------------------------
David F. Brailsford               Dunford Professor of Computer Science
e-mail: dfb@cs.nott.ac.uk	  Dept. of Computer Science
                                  University of Nottingham 
 				  NOTTINGHAM NG7 2RD, UK.
---------------------------------------------------------------------