Learn R Programming

subgroup.discovery (version 0.3.1)

pima: Pima Indians Diabetes Database.

Description

Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito vgs@aplcen.apl.jhu.edu Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 (c) Date received: 9 May 1990

Usage

pima

Arguments

Format

A data frame with 768 rows and 9 variables:

pregnant

Number of times pregnant

glucose

Plasma glucose concentration a 2 hours in an oral glucose tolerance test

bp

Diastolic blood pressure (mm Hg)

skin_thickness

Triceps skin fold thickness (mm)

insulin

2-Hour serum insulin (mu U/ml)

bmi

Body mass index (weight in kg/(height in m)^2)

diabetes

Diabetes pedigree function

age

Age (years)

class

Class variable (0 or 1)

Details

Past Usage: Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

The diagnostic, binary-valued variable investigated is whether the patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care). The population lives near Phoenix, Arizona, USA.

Results: Their ADAP algorithm makes a real-valued prediction between 0 and 1. This was transformed into a binary decision using a cutoff of 0.448. Using 576 training instances, the sensitivity and specificity of their algorithm was 76

Relevant Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.