Information Exploration and Visualization
Introduction
Information exploration should be a joyous experience, but many commentators
talk of information overload and anxiety (Wurman, 1989). However, there
is promising evidence that the next generation of digital libraries for
structured databases, textual documents, and multimedia will enable convenient
exploration of growing information spaces by a wider range of users. User-interface
designers are inventing more powerful search and visualization methods,
while offering smoother integration of technology with task.
The terminology swirl in this domain is especially colorful. The older terms
of information retrieval (often applied to bibliographic and textual document
systems) and database management (often applied to more structured relational
database systems with orderly attributes and sort keys), are being pushed
aside by newer notions of information gathering , seeking, or visualization
and data mining , warehousing, or filtering. While distinctions are subtle,
the common goals reach from finding a narrow set of items in a large collection
that satisfy a well-understood information need (known-item search) to developing
an understanding of unexpected patterns within the collection (browse) (Marchionini,
1995).
Exploring information collections becomes increasingly difficult as the
volume grows. A page of information is easy to explore, but when the information
becomes the size of a book, or library, or even larger, it may be difficult
to locate known items or to browse to gain an overview. The strategies to
focus and narrow are well understood by librarians and information-search
specialists, and now these strategies are beginning to be implemented for
widespread use. The computer is a powerful tool for searching, but traditional
user interfaces have been a hurdle for novice users (complex commands, Boolean
operators, unwieldy concepts) and an inadequate tool for experts (difficulty
in repeating searches across multiple databases, weak methods for discovering
where to narrow broad searches, poor integration with other tools) (Borgman,
1986). This chapter suggests some novel possibilities for first time or
intermittent versus frequent computer users, and also for task-domain novices
versus experts. Improvements on traditional text and multimedia searching
seem possible and a new generation of visualization strategies for query
formulation and information presentation is emerging.
Designers are just discovering how to use the rapid and high resolution
color displays to present large amounts of information in orderly and user-controlled
ways. Perceptual psychologists, statisticians, and graphic designers (Bertin,
1983; Cleveland, 1993; Tufte, 1983, 1990) offer valuable guidance about
presenting static information, but the opportunity for dynamic displays
takes user interface designers well beyond current wisdom.
The Objects/Actions Interface model helps by separating task domain concepts
(do you think of your organization as a hierarchy or a matrix?) from interface
concepts (your hierarchy can be represented as an outline, node-link diagram,
or treemap). The OAI model also separates high-level interface issues (are
overview diagrams necessary for navigation?) and low-level interface issues
(will color or size coding be used to represent salary?).
First-time users of an information-exploration system (whether they have
little or much task-domain knowledge) are struggling to understand what
they see on the display while keeping in mind their information needs. They
would be distracted if they had to learn complex query languages or elaborate
shape-coding rules. They need the low cognitive burdens of menu and direct-manipulation
designs and simple visual coding rules. As users gain experience with the
interface, they can request additional features by adjusting control panels.
Knowledgeable and frequent users want a wide range of search tools with
many options that allow them to compose, save, replay, and revise increasingly
elaborate query plans.
To facilitate discussion, some terms need definition. Task domain objects,
such as Leonardo's notebooks or sports-video segments from the Olympics,
are represented by interface domain objects in structured relational databases,
textual document libraries, or multimedia document libraries. A structured
relational database consists of relations and a schema to describe the relations.
Relations have items (usually called tuples or records) and each item has
multiple attributes (often called fields), which each have attribute values.
In the relational model, items are an unordered set (although one attribute
can contain sequencing information or be a unique key to identify or sort
the items) and attributes are atomic. A textual document library consists
of a set of collections (typically 1-200 collections per library) plus some
descriptive attributes about the library (for example, name, location, owner).
Each collection has a name plus some descriptive attributes about the collection
(for example, location, media type, curator, donor, dates, geographic coverage),
and a set of items (typically 10 - 100,000 items per collection). While
items in a collection may vary greatly, we will assume that a superset of
attributes exist for all the items. Attributes may be blank, have single
values, multiple values, or they may be a lengthy text. Typically a collection
is owned by a single library, and an item belongs to a single collection,
although exceptions are possible. A multimedia document library consists
of collections of documents in which the documents can contain images, sound,
video, animations, etc.
Task actions are represented by interface actions such as browsing, searching,
joining, or linking. Users begin by formulating their information needs
in the task domain. Tasks can range from specific fact-finding where there
is a single readily identifiable outcome to more extended fact-finding with
uncertain but replicable outcomes. More unstructured tasks include open-ended
browsing of known collections and exploration of the availability of information
on a topic:
Specific fact-finding (Known item search)
Find the Library of Congress call number of Future Shock
Find the phone number of Bill Clinton
Find the highest resolution LANDSAT image of College Park at noon on Dec.
13, 1997
Extended fact-finding
What other books are by the author of Jurassic Park ?
What kinds of music is Sony publishing?
Which satellites took images of the Persian Gulf War?
Open-ended browsing
Does the Mathew Brady Civil War photo collection show the role of
women?
Is there new work on voice recognition in Japan?
Is there a relationship between carbon monoxide levels and decertification?
Exploration of availability
What genealogy information is at the Library of Congress?
What is there on the Grateful Dead band members?
Can NASA datasets show acid rain damage to soy crops?
Once users's information needs have been clarified, the first step in satisfying
them is to decide where to search (Marchionini, 1995). The conversion of
information needs, stated in task domain terminology, to interface actions
is a large cognitive step, but it must be accomplished before expression
of these actions in a query language or series of mouse selections can begin.
Supplemental finding aids can help users clarify and pursue their information
needs. Examples include tables of contents or indexes in books, descriptive
introductions, concordances, key-word-in-context (KWIC) lists, and subject
classifications. Careful understanding of previous and potential search
requests, the task analysis, can improve search results by offering hot
topic lists and useful classification schemes. For example, the Congressional
Research Service has a list of approximately 80 hot topics covering current
bills before Congress and 5000 terms in their Legislative Indexing Vocabulary.
The National Library of Medicine maintains the Medical Subject Headings
(MeSH) with 14,000 items in a seven-level hierarchy.
 |
Please send
comments and suggestions to the Booksite
Director
Last Updated:
11 December 2002
|