 |
SUPPLEMENTS
Instructor
Supplements
* PowerPoints:
Each
figure and table in the text is part of a PowerPoint presentation.
* Test Bank: Several
test questions are provided for each chapter.
* Solutions: Answers
to selected exercises. Answers are given for most of the end of
chapter exercises.
*
Lesson planner: The
lesson planner contains ideas for lecture format and points for
discussion. The planner also provides suggestions for using selected
end of chapter exercises in a laboratory setting.
Student
Supplements
* CD-ROM: includes
the IDA Software and Data Packages
IDA Software Package
Experiential
learning is required to develop the skills required of a data mining
expert. The iDA software is designed to give students this needed
hands-on experience with the data mining process. The software is
used in several chapters to illustrate many important data mining
concepts. Chapters 4, 5, 7, 9, 10, 11 and 13 have several end of
chapter exercises designed for the iDA software.
iDA consists of a preprocessor, a report generator, and three data
mining toolsæESX for supervised learning and unsupervised
clustering, a neural network tool for creating supervised back propagation
models and unsupervised self-organizing maps, and a production rule
generator. Since iDA is an Excel add-on, the user interface is Microsoft
Excel. We chose iDA because of its flexibility and ease of use.
IDA Dataset Package
Several
datasets are included with the iDA software. The datasets come from
three general application areas æ business, medicine and health,
and science. All datasets are in MS Excel format and ready to use.
Datasets can be described along several dimensions including the
number of data instances, number of attributes, amount of missing
or noisy data, whether data attributes are clearly defined, whether
the data is categorical, numeric or a combination of both data types,
whether well-defined classes exist in the data, whether a time element
is implicit in the data, whether the input attributes can differentiate
between known classes contained in the data, and whether input attributes
are correlated. As the above factors affect the way data mining
is performed, the iDA datasets were chosen to provide variety among
these dimensions. The datasets also serve several general purposes.
Specifically,
~ Provide the beginning student with experimental data to experience
the data mining process without requiring the student to deal with
data pre-processing issues.
~ Show the wide range of problem areas and problem types appropriate
for data mining solution.
~ Explain data mining outcomes.
~ Illustrate the knowledge discovery process.
~ Recognize that experimentation with several data mining techniques
may be necessary to create a best model for a specific dataset.
Here is a short description of the datasets that are part of the
iDA software package. The description includes a short statement
about one or more characteristics of each dataset.
Business Applications
The Credit Card Promotion Dataset.
This is a hypothetical dataset containing information about credit
card holders who have accepted or rejected various promotional offerings.
The dataset is used to illustrate many of the data mining techniques
discussed in the text.
The Credit Card Screening Dataset.
The file contains data about individuals who have applied for a
credit card. The output attribute indicates whether each individuals
credit card application was accepted or rejected. The input attributes
have been changed to meaningless symbols to protect confidentiality
of the data.
The Deer Hunters Dataset.
The dataset holds information about deer hunters who are either
willing or unwilling to spend more for their next hunting trip.
Several irrelevant input attributes are present in the data.
The Stock Index Dataset.
The data is a time series representation of average weekly closing
prices for the Nasdaq and the Dow Jones Industrial Average.
Medicine and Health
The Cardiology Patient Dataset.
This dataset holds medical information about two groups of individuals.
One group of individuals have suffered one or more heart attacks.
The second group of individuals have not experienced a heart attack.
The dataset contains a nice mix of categorical and numeric attributes.
The Spine Clinic Dataset. This dataset contains medical information
about individuals who have had lower back surgery. Some of these
folks have returned to work while others have not. A clear definition
of the mean of each attribute is not given. The dataset contains
both numeric and categorical data.
Science
The Gamma Ray Burst Dataset.
The dataset contains recorded information about individual gamma-ray
bursts. Gamma-ray bursts are brief gamma-ray flashes whose origins
are outside our solar system. The bursts were observed by the Burst
And Transient Source Experiment (BATSE) aboard NASA's Compton Gamma-Ray
Observatory between April 1991 and March 1993. Although astronomers
agree that classes of gamma ray bursts exists, they do not agree
on a specific class structure.
The Landsat Image Dataset.
The dataset contains pixels representing a digitized satellite image
of a portion of the earth's surface. Each instance has been classified
into one of fifteen categories. Because of the large number of individual
classes, classification accuracy is affected by model-specific parameter
settings.
The temperature Dataset.
The dataset offers the normal average January minimum temperature
in degrees Fahrenheit for 56 U.S. cities. City latitude and longitude
values are also provided. All attributes are numeric.
Miscellaneous
The Titanic Dataset.
The dataset contains 2201 instances where each instance describes
attributes of an individual passenger or crew member aboard the
Titanic. The output attribute indicates whether the passenger or
crew member survived.
|