DTUI Booksite

Chapter 15 Introduction

[Chapter 14 | Introductions Index | Chapter 16]
[ Lecture Notes | Web Resources | Exam Questions]


Information Exploration and Visualization

Introduction

Information exploration should be a joyous experience, but many commentators talk of information overload and anxiety (Wurman, 1989). However, there is promising evidence that the next generation of digital libraries for structured databases, textual documents, and multimedia will enable convenient exploration of growing information spaces by a wider range of users. User-interface designers are inventing more powerful search and visualization methods, while offering smoother integration of technology with task.

The terminology swirl in this domain is especially colorful. The older terms of information retrieval (often applied to bibliographic and textual document systems) and database management (often applied to more structured relational database systems with orderly attributes and sort keys), are being pushed aside by newer notions of information gathering , seeking, or visualization and data mining , warehousing, or filtering. While distinctions are subtle, the common goals reach from finding a narrow set of items in a large collection that satisfy a well-understood information need (known-item search) to developing an understanding of unexpected patterns within the collection (browse) (Marchionini, 1995).

Exploring information collections becomes increasingly difficult as the volume grows. A page of information is easy to explore, but when the information becomes the size of a book, or library, or even larger, it may be difficult to locate known items or to browse to gain an overview. The strategies to focus and narrow are well understood by librarians and information-search specialists, and now these strategies are beginning to be implemented for widespread use. The computer is a powerful tool for searching, but traditional user interfaces have been a hurdle for novice users (complex commands, Boolean operators, unwieldy concepts) and an inadequate tool for experts (difficulty in repeating searches across multiple databases, weak methods for discovering where to narrow broad searches, poor integration with other tools) (Borgman, 1986). This chapter suggests some novel possibilities for first time or intermittent versus frequent computer users, and also for task-domain novices versus experts. Improvements on traditional text and multimedia searching seem possible and a new generation of visualization strategies for query formulation and information presentation is emerging.

Designers are just discovering how to use the rapid and high resolution color displays to present large amounts of information in orderly and user-controlled ways. Perceptual psychologists, statisticians, and graphic designers (Bertin, 1983; Cleveland, 1993; Tufte, 1983, 1990) offer valuable guidance about presenting static information, but the opportunity for dynamic displays takes user interface designers well beyond current wisdom.

The Objects/Actions Interface model helps by separating task domain concepts (do you think of your organization as a hierarchy or a matrix?) from interface concepts (your hierarchy can be represented as an outline, node-link diagram, or treemap). The OAI model also separates high-level interface issues (are overview diagrams necessary for navigation?) and low-level interface issues (will color or size coding be used to represent salary?).

First-time users of an information-exploration system (whether they have little or much task-domain knowledge) are struggling to understand what they see on the display while keeping in mind their information needs. They would be distracted if they had to learn complex query languages or elaborate shape-coding rules. They need the low cognitive burdens of menu and direct-manipulation designs and simple visual coding rules. As users gain experience with the interface, they can request additional features by adjusting control panels. Knowledgeable and frequent users want a wide range of search tools with many options that allow them to compose, save, replay, and revise increasingly elaborate query plans.

To facilitate discussion, some terms need definition. Task domain objects, such as Leonardo's notebooks or sports-video segments from the Olympics, are represented by interface domain objects in structured relational databases, textual document libraries, or multimedia document libraries. A structured relational database consists of relations and a schema to describe the relations. Relations have items (usually called tuples or records) and each item has multiple attributes (often called fields), which each have attribute values. In the relational model, items are an unordered set (although one attribute can contain sequencing information or be a unique key to identify or sort the items) and attributes are atomic. A textual document library consists of a set of collections (typically 1-200 collections per library) plus some descriptive attributes about the library (for example, name, location, owner). Each collection has a name plus some descriptive attributes about the collection (for example, location, media type, curator, donor, dates, geographic coverage), and a set of items (typically 10 - 100,000 items per collection). While items in a collection may vary greatly, we will assume that a superset of attributes exist for all the items. Attributes may be blank, have single values, multiple values, or they may be a lengthy text. Typically a collection is owned by a single library, and an item belongs to a single collection, although exceptions are possible. A multimedia document library consists of collections of documents in which the documents can contain images, sound, video, animations, etc.

Task actions are represented by interface actions such as browsing, searching, joining, or linking. Users begin by formulating their information needs in the task domain. Tasks can range from specific fact-finding where there is a single readily identifiable outcome to more extended fact-finding with uncertain but replicable outcomes. More unstructured tasks include open-ended browsing of known collections and exploration of the availability of information on a topic:

Specific fact-finding (Known item search)

Find the Library of Congress call number of Future Shock
Find the phone number of Bill Clinton
Find the highest resolution LANDSAT image of College Park at noon on Dec. 13, 1997


Extended fact-finding
What other books are by the author of Jurassic Park ?
What kinds of music is Sony publishing?
Which satellites took images of the Persian Gulf War?


Open-ended browsing
Does the Mathew Brady Civil War photo collection show the role of women?
Is there new work on voice recognition in Japan?
Is there a relationship between carbon monoxide levels and decertification?

Exploration of availability
What genealogy information is at the Library of Congress?
What is there on the Grateful Dead band members?
Can NASA datasets show acid rain damage to soy crops?


Once users's information needs have been clarified, the first step in satisfying them is to decide where to search (Marchionini, 1995). The conversion of information needs, stated in task domain terminology, to interface actions is a large cognitive step, but it must be accomplished before expression of these actions in a query language or series of mouse selections can begin.
Supplemental finding aids can help users clarify and pursue their information needs. Examples include tables of contents or indexes in books, descriptive introductions, concordances, key-word-in-context (KWIC) lists, and subject classifications. Careful understanding of previous and potential search requests, the task analysis, can improve search results by offering hot topic lists and useful classification schemes. For example, the Congressional Research Service has a list of approximately 80 hot topics covering current bills before Congress and 5000 terms in their Legislative Indexing Vocabulary. The National Library of Medicine maintains the Medical Subject Headings (MeSH) with 14,000 items in a seven-level hierarchy.


Please send comments and suggestions to the Booksite Director
Last Updated: 11 December 2002