Navigation issues for the WWW: A hypermedia research point of view

Short scientific article, which I basically wrote as an excuse to get invited to the legendary First International Conference on the World-Wide Web, held on 25, 26 and 27 May 1994 at the Web's birthplace, CERN in Switzerland. In this article I already argued for the use of meta-information ("meta-data") to increase the manageability and usefulness of the Web, an approach which is only now being more widely adopted as part of the Semantic Web initiative of the W3C.

In the field of hypertext and hypermedia research, navigation issues such as the "navigational disorientation" problem and the "cognitive overhead" problem have been long-standing classics [Gygi90][Dillon90]. Especially in the context of so-called volatile hypertexts (dynamic document spaces whose information content and link structure are subject to rapid change) these navigation issues are seen as a direct corollary of the more fundamental "linking" issue: How do we support links which are persistent across time and space? How do we support links which are consistent in appearance and behaviour? Addressing these and similar questions is crucial for the future maintainability and usability of the World-Wide Web in the face of its present exponential rate of growth. Fortunately, the Web community can learn a lot from some of the ideas and concepts which have been developed in hypertext and hypermedia research. Although the "linking" issue is far from resolved [Meyrowitz89][DeYoung90], a number of sound principles have emerged from years of discussion and research:

  1. Separate link structure from information contents
  2. Provide meta-information for both nodes and links
  3. Turn the meta-information itself into a navigatable space

I briefly discuss each of these principles below, investigating how and to what extent they may be usefully adopted in the present WWW (or future versions of it). I do not claim that these principles are the only relevant principles to be derived from hypermedia research, nor that they are the most appropriate ones. However, they can serve as a starting point for investigations into the development of link mechanisms and navigation techniques that will scale up for use in a truely massive, million-node and million-link WWW.

1. Separate link structure from information contents

The important conceptual breakthrough behind generic markup languages such as SGML (Standard Generalized Markup Language) was the realization of the need to separate the logical structure of a document from the layout structure. This allows us to represent and manipulate the logical structure of a document (the semantic role of the different elements of the document), and enables us to attach different layout presentations to the same document (a layout for optimal presentation on paper, a layout for optimal presentation on screen, etc.). This separation of logical structure and layout structure allows the author of a document to concentrate on the correct structure and contents of the document, while it becomes the responsability of the 'publisher' to come up with the most appropriate (paper or electronic) presentation.

As HTML is based on SGML, in WWW documents this same separation between logical structure and layout structure is maintained. However, link-related data, i.e. the way anchors or begin/end points of links are specified (URLs), is embedded right in the middle of the information contents itself. This is typical of first-generation hypertext systems, where no provision is made for the collaborative authoring of documents (where links can be accidentally deleted because they are part of some content which is deleted), nor for the automatic maintenance of links (where it is necessary to store auxiliary information about the link such as its creation date, access rights, etc.).

Supporting these higher functionalities requires separating the link structure from the information contents, so that both can be changed or updated without indirectly affecting the other [Furuta90]. One way this separation is achieved in second-generation hypertext systems is by storing all link-related data in a separate header file associated with the document, and referring from this header file to the appropriate system-generated anchor identifiers inside the document. This indirection at the document level allows the hypertext system to locate and change the characteristics of the links without having to go through the contents of the document, or to locate and change the contents of the documents while still maintaining the presence and consistency of the relevant link anchors.

This separation of the link-related data from the actual tag referring to the anchor of the link may seem to run counter to the notion of embedding all relevant information using tags inside the contents of the document itself. However, it is easy to see that we can only go so far with tags inside a document: the more additional information we want to store together with a tag (using the appropriate attributes of the corresponding element), the more cumbersome it becomes to manipulate that tag. This is true for anchor tags (where we want to store more "link behaviour" information such as ROLE) as well as for other tags (where we want to store more "visual presentation" information such as ALIGN). The only way to do this elegantly is by attaching "visual presentation styles" to elements which have to be displayed and to attach "link behaviour styles" to anchor elements, and by keeping these stylesheets separate from the logical structure and contents of the document.

2. Provide meta-information for both nodes and links

It is generally accepted within the hypermedia research community that to be able to provide some kind of intelligent support for navigation it is necessary to make a distinction between the document space (the collection of documents within the hypernetwork) and the index space (the collection of terms used to characterize these documents) [Bruza90][Marmann92]. In order to talk meaningfully about the contents of the documents in the hypernetwork, it has to be possible to describe that contents in a uniform, standardized way by e.g. using some means of (keyword-based) indexing. Once the contents of documents can be characterized in this way, it becomes possible to manipulate documents regardless of the link structures defined between them (e.g. collect all documents dealing with a certain subject at a certain site or world-wide, inactivate links to nodes which have a content description which does not match my personal profile of interests, etc.). The use of meta-information about the nodes may also provide significant benefits with regards to Internet load and responsiveness: first looking at a node's descriptive information will avoid the need to download the whole document at once.

Nodes and links are complementary to one another: if we want to support better navigation possibilities between nodes using links, we need more information about the contents of those nodes [Nanard91], and vice versa, if we want to create links between nodes which have a more consistent behaviour, we need more information about the purpose of those links [Nanard93]. So next to an index space with meta-information about the nodes, we need an index space with meta-information about the links, defining what types of links are allowable for what kinds of documents, what their navigational behaviour has to be, what kind of document interconnectivity they support (one-to-one, one-to-many, many-to-many), etc. Once the behaviour of links can be characterized in this way, it becomes possible to avoid dangling links (by associating some default behaviour with a given link type), to check the referential consistency of links (by checking whether their begin/end nodes are of the correct type), etc.

Both for nodes and links, a careful balance will have to be found between the representational richness of their meta-information, and the added functionality which the availability of such meta-information brings to the readers and the maintainers of the WWW. The present WWW growth is largely due to its slightly anarchic approach to node creation and link maintenance, and users of the WWW will only be willing to give up that freedom if they see some real and tangible benefits. This is a fundamental issue for any multi-user, large-scale hypermedia system: as the size of the hypernetwork reaches a critical mass, it becomes necessary to impose restrictions on its structure and enforce discipline amongst its users if the hypernetwork is not to turn into an amorph mass of pointless links and meaningless nodes.

3. Turn the meta-information itself into a navigatable space

The principal goal of a separate index space with meta-information on nodes and links is to provide the basic management information necessary to support automatic Web maintenance, navigation and retrieval tools. Web spiders can use this meta-information when looking for dangling links and restoring these, e.g. by trying to resolve the appropriate link destinations or by notifying the owner of the link [Palaniappan90]. Or intelligent software agents can locate documents corresponding with some given set of search criteria (using only the relevant links when performing a ripple search and thus reducing the required network load) [Shibata93]. Or virtual information clearing houses can be constructed, where the description of the content of nodes and the function of links is checked for their conformance with respect to the index space before the nodes and links are accepted for inclusion in the Web.

However, once this index space has been standardized and has been used to characterize most documents and links in the Web, it can be used as a navigatable space in its own right. Instead of navigating through countless documents and links in the hope of finding the relevant piece of information, the index space itself can be navigated first, and once the correct combination of index terms has been found, the reader can "beam down" to the corresponding document(s) [Purgathofer93]. Using the index space in this way has several benefits: the information density of the index space is bound to be lower than the information density of the whole Web itself, resulting in less cognitive overhead when searching for information. Also, it should be possible to develop novel ways of visualizing this index space (using tools like the Information Visualizer from Rank Xerox Parc or other virtual reality approaches), without having to go through the effort of trying to visualize the Web itself. Navigating this index space will also cause less network load that actually navigating the Web itself, since the amount of information being transmitted between an "index space server" and the WWW client will be substantially less than when the documents themselves have to be transmitted in their entirety to the WWW client.

Discussion

I believe that this brief introduction to some important principles learnt from hypertext and hypermedia research may represent an interesting contribution to this Workshop. A number of issues related to link maintenance and navigation support in the WWW are of course still open for discussion. I offer here a number of questions that I hope may spark off a lively exchange of ideas on these issues:

References

[Bruza90]
Bruza, P.D. and van der Weide, Th.P. (1990). Two level hypermedia - an improved architecture for hypertext. In Proceedings of the Data Base and Expert System Applications Conference DEXA '90 (Eds Tjoa, A M. and Wagner, R.). Springer-Verlag, Berlin, Heidelberg, New York, pp. 76-83.
[DeYoung90]
De Young, L. (1990). Linking considered harmful. In Hypertext: Concepts, systems and applications (Eds Rizk, A., Streitz, N. and André, J.), The Cambridge Series on Electronic Publishing. Cambridge University Press, Cambridge, New York, Port Chester, Melbourne, Sydney, pp. 238-249.
[Dillon90]
Dillon, A., McKnight, C. and Richardson, J. (1990). Navigation in hypertext: A critical review of the concept. In Proceedings of the IFIP TC 13 Third International Conference on Human-Computer Interaction INTERACT '90 (Cambridge, August 27-31) (Eds Diaper, D., Gilmore, D., Cockton, G. and Shackel, B.). Elsevier Science Publishers B.V. (North-Holland), Amsterdam, New York, Oxford, Tokyo, pp. 587-592.
[Furuta90]
Furuta, R. and Stotts, P.D. (1990). Separating hypertext content from structure in Trellis. In Hypertext: State of the art (Eds McAleese, R. and Green, C.). Blackwell Scientific Publications Ltd., Oxford, pp. 205-238.
[Gygi90]
Gygi, K. (1990). Recognizing the symptoms of hypertext ... and what to do about it. In The art of human computer design (Ed. Laurel, B.). Addison-Wesley Publishing Company Inc., Reading (Massachusetts), Menlo Park (California), New York, pp. 279-287.
[Marmann92]
Marmann, M. and Schlageter, G. (1992). Towards a better support for hypermedia structuring: The HYDESIGN model. In ECHT '92 Proceedings (Milano, Italy, November 30 - December 4) (Eds Lucarella, D., Nanard, J., Nanard, M. and Paolini, P.). ACM Press, New York, pp. 232-241.
[Meyrowitz89]
Meyrowitz, N. (1989). The missing link: Why we're all doing hypertext wrong. In The society of text: Hypertext, hypermedia and the social construction of information (Ed. Barrett, E.). The MIT Press, Cambridge, Massachusetts, London, England, pp. 107-114.
[Nanard91]
Nanard, J. and Nanard, M. (1991). Using structured types to incorporate knowledge in hypertext. In Hypertext '91 Proceedings (San Antonio, Texas, December 15-18) (Eds Stotts, P.D. and Furuta, R.K.). ACM Press, New York, pp. 329-343.
[Nanard93]
Nanard, J. and Nanard, M. (1993). Should anchors be typed too? An experiment with MacWeb. In Hypertext '93 Proceedings (Seattle, Washington, November 14-18) (Eds Kacmar, C.J. and Schnase, J.L.). ACM Press, New York, pp. 51-62.
[Palaniappan90]
Palaniappan, M., Yankelovich, N. and Sawtelle, M. (1990). Linking active anchors: A stage in the evolution of hypermedia. Hypermedia, 2 (1), 47-66.
[Purgathofer93]
Purgathofer, P. and Grechenig, T. (1993). Navigation in hypertext by browsing in survey objects. In Hypermedia - Proceedings der Internationalen Hypermedia '93 Konferenz (Zurich, Switzerland, March 2-3) (Eds Frei, H.P. and Schauble, P.), Informatik aktuell. Springer-Verlag, Berlin, Heidelberg, New York, pp. 116-129.
[Shibata93]
Shibata, Y. and Katsumoto, M. (1993). Dynamic hypertext and knowledge agent systems for multimedia information networks. In Hypertext '93 Proceedings (Seattle, Washington, November 14-18) (Eds Kacmar, C.J. and Schnase, J.L.). ACM Press, New York, pp. 82-93.
Written by Hans C. Arents
Published on-line May 23, 1994
Coded in valid XHTML 1.0 Creative Commons License Made with Cascading Style Sheets