mlbench (version 2.1-3.1)

PimaIndiansDiabetes: Pima Indians Diabetes Database

Description

A data frame with 768 observations on 9 variables.

Usage

data(PimaIndiansDiabetes)
  data(PimaIndiansDiabetes2)

Arguments

Format

pregnantNumber of times pregnant
glucosePlasma glucose concentration (glucose tolerance test)
pressureDiastolic blood pressure (mm Hg)
tricepsTriceps skin fold thickness (mm)
insulin2-Hour serum insulin (mu U/ml)
massBody mass index (weight in kg/(height in m)^2)
pedigreeDiabetes pedigree function
ageAge (years)
diabetesClass variable (test for diabetes)

Details

The data set PimaIndiansDiabetes2 contains a corrected version of the original data set. While the UCI repository index claims that there are no missing values, closer inspection of the data shows several physical impossibilities, e.g., blood pressure or body mass index of 0. In PimaIndiansDiabetes2, all zero values of glucose, pressure, triceps, insulin and mass have been set to NA, see also Wahba et al (1995) and Ripley (1996).

References

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

Brian D. Ripley (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.

Grace Whaba, Chong Gu, Yuedong Wang, and Richard Chappell (1995), Soft Classification a.k.a. Risk Estimation via Penalized Log Likelihood and Smoothing Spline Analysis of Variance, in D. H. Wolpert (1995), The Mathematics of Generalization, 331-359, Addison-Wesley, Reading, MA.

Examples

Run this code
  data(PimaIndiansDiabetes)
  summary(PimaIndiansDiabetes)

  data(PimaIndiansDiabetes2)
  summary(PimaIndiansDiabetes2)

Run the code above in your browser using DataCamp Workspace