Notes for Teachers

The Internet Projects can be used in several ways. Most of these data-driven projects can be used in more than one situation--you can use a project to teach several types of statistical concepts. These projects work well for individual assignments, especially when learning the interpretation of statistical results. Notes on how you can use the different projects:
  1. The Titanic Disaster provides an excellent introduction to understanding the real life situation behind a table of numbers. For more information on using the data set in class, you may want to review the paper at the Journal of Statistics Education. The paper is called The 'Unusual Episode' Data Revisited. The data obviously lends itself to class discussion and can be used in Chi-square procedures as well.

    In the Analysis section, the Critique of the Rimm study link provides a good framework to explain the nature of data and to question data sources and measurement techniques.
    The animation showing sex ratios of the US over time provides a good example of the power of graphical techniques in understanding data.

  2. Infant mortality can be explored in any of several ways...see the References section for data to use in other lessons. You can use these links for data and examples using contingency tables, or to show the difference between two populations (using race as the difference).
    In the Analysis section, you can use the 'Drinking habits of students' link to get student's interest and then explore the data using the techniques learned in this chapter.

  3. Old Faithful and a Survey of Wages can be used to demonstrate concepts using univariate statistics, including distribution shapes and bimodal distributions. The 'Interactive Histogram' demonstrates the influence of changing bin size.

  4. The Space Shuttle Challenger makes a very good lesson for correlation and logistic regression. The data and story is used here to emphasize that probability has important meaning in the real world apart from dice rolling and coin tossing.
    It also helps to make the important bridge between the concepts underlying statistical parameters (and our probabilistic confidence in the estimates) with probability theory.

    Note also that this lesson contains (in the Analysis section) a good selection of probability demonstrations. You can bolster your classroom work on this traditionally difficult topic with these interactive demonstrations. The concepts can be mastered much more easily when students see and interact with the concepts you teach them in class.

  5. Racial Bias in Jury Selection provides the probability distribution for a hypothetical situation of drawing a jury of twelve people from a population of whites and blacks. A true-life situation provides context for the problem. Students can use an online calculator to compute and plot other probability distributions.

  6. IQ of Girls and Boys provides an example to show the difference (or lack thereof) between two populations. You can use the data in this lesson to show the gender-based difference (which exists) for brain weights, and the difference (which does not exist) for IQ performance.

  7. Simulations provides links to a set of interactive demonstrations and simulations to help you teach the Central Limit Theorem. You can assign these demos as homework and let your students use links in this lesson to help solve their homework problems.

  8. Global Warming contains average global temperatures by year. In this project the data are used to calculate a confidence interval for the average temperature. These data show the difference in temperature relative to the mid-1800s. You can also test whether the mean is significantly different from zero, which indicates a real temperature increase over the past 100 years. The data can also be used for a regression lesson.

  9. The Ozone Hole provides the incidence of skin cancer by latitude. The ozone levels at or near the equator are thought to be near normal levels: you can partition the data to compare the values that are far-from-the-equator (latitudes greater than 25 degrees) with skin cancer incidence near the equator (latitudes less than 25 degrees). The null hypothesis to test is that the skin cancer rate for more extreme latitudes is no different than the rate found in the tropics. The data contain an outlier (which can be seen in the boxplot) and this point is removed in the analysis.

  10. Women Workers and Equality in the U.S. provide the figures for pay differences for 65 different occupations. Summary statistics are provided in order to test whether the average pay for men is equal to the average pay rate for women.
    Two assignments are included in which students link to data on the Internet to test their knowledge of using tests for inferences for two population means.

  11. Incidence of Firearm-Related Deaths examines the difference in the number of firearm-related deaths for males and females. Each observation corresponds to a particular age group. The magnitudes differ dramatically in these data, as does the standard deviation. The larger question of this project is the reason for the differing variance.
    An assignment is provided to allow the student to use the Internet in testing their knowledge of inferences for standard deviations.

  12. AIDS and Condom Use includes an animated time-lapse map showing the spread of the HIV virus throughout the United States. The data come from a study comparing student condom-use in two school systems. One school system provided a condom-availability program. The other school system had no such program. The question in this project is whether sexually active students in New York City (the system with the condom program) are more likely to use a condom than are sexually active students in Chicago (the system without such a program).
    The assignment includes exploration of case studies on the Internet of similar types of data (not the same subject matter) and readings concerning types of error.

  13. Sex and the Death Penalty provides data for men and women on death row. A 2x2 table provides the data for convicted murderers in the U.S. from 1973 to 1995. The rows are for males and females; the columns are for those who received the death sentence (and not executed) and those who were executed.
    The assignment asks students to examine data collected from a survey of college freshmen reporting that capital punishment should be abolished, by year, for males and females. Two questions are posed: (1) Have attitudes changed over time? (2) Do males and females have different opinions on the death penalty?

  14. Assisted Reproductive Technology can be used to study univariate concepts, including histograms and distribution shapes. It can also be used, obviously, for regression analysis. The selected data shows two curves, and an analysis is included for the first curve. This curve depicts a striking downward trend of success versus the age of the prospective mother (when the implanted eggs are her own).

  15. Assisted Reproductive Technology Revisited further analyzes the results from the previous lesson (from Chapter 14). Another curve is also provided which is interesting but more complicated. It shows a significant upward trend of success versus the age of mother (when the mother uses donor eggs). Whether this counter-intuitive trend is 'real' (and, if so, what can be the underlying explanation) is a good discussion topic, and can get students talking about the problems all statisticians face.

  16. Brain Damage and the Courts provides data collected within the Australian court system. The subjects in the study are the plaintiffs: They are people who suffered brain damage in automobile accidents and sued for monetary damages based on their injury. The question underlying the data in this case is whether there is a difference in requested compensation based on age. An analysis of variance, which includes the Tukey test, provides the results.

  17. Homicides in Detroit provides an unusual data set for lessons on multiple regression. A subset of three predictors can be found (unemployment, gun licenses, and weekly wages) which gives a very much better fit to the data than the subsets found via stepwise selection, either forward or backward. For more information, see the text Subset Selection in Regression, A. J. Miller, 1990, Chapman and Hall, NY.

  18. The Growth of the Internet examines data and predict trends on the growth of the Internet. The data provides the number of computer 'hosts' on the Internet from 1981 to 1997. The trend is exponential. First a straight-line regression is fit, then a quadratic, each with obvious problems. The response variable is transformed with the natural logarithm and the data is fit again, using least-squares regression. The fit seems to be good, but students are cautioned against extrapolation.

  19. Sex and the Modern Fruitfly explores data gathered in a study performed on fruitflies. The study was conducted to determine whether the longevity of males is related to their reproductive cycle, and it provides the lifetimes for five groups of males that were 'exposed' to varying numbers of fertile and non-fertile (already impregnated) females. The analysis of variance and Tukey test results are provided.
    For more information on using the data set in class, you may want to review the paper at the Journal of Statistics Education. The paper is called Sexual Activity and the Lifespan of Male Fruitflies: A Dataset That Gets Attention.

Return to the Internet Projects Home Page