Titanic data with passenger names and other details removed.

A data frame with 1046 observations on 6 variables.

ll{ pclass passenger class, unordered factor: 1st 2nd 3rd survived factor: died or survived sex unordered factor: male female age age in years, min 0.167 max 80.0 sibsp number of siblings or spouses aboard, integer: 0...8 parch number of parents or children aboard, integer: 0...6 }


The dataset was compiled by Frank Harrell and Robert Dawson:

See also:

For this version of the Titanic data, passenger details were deleted, survived was cast as a factor, and the name changed to ptitanic to minimize confusion with other versions.

In this data the crew are conspicuous by their absence.

Contents of ptitanic: pclass survived sex age sibsp parch 1 1st survived female 29.000 0 0 2 1st survived male 0.917 1 2 3 1st died female 2.000 1 2 4 1st died male 30.000 1 2 5 1st died female 25.000 1 2 ... 1309 3rd died male 29.000 0 0 How ptitanic was built: load("titanic3.sav") # from Dr. Harrell's web site # discard name, ticket, fare, cabin, embarked, body, home.dest ptitanic <- titanic3[,c(1,2,4,5,6,7)] # change survived from integer to factor ptitanic$survived <- factor(ptitanic$survived, labels=c("died", "survived")) save(ptitanic, file="ptitanic.rda") This version of the data differs from etitanic in the earth package in that here survived is a factor (not an integer) and age has some NAs.

# main indicator of missing data is 3rd class esp. with many children
obs.with.nas <- rowSums( > 0
prp(rpart(obs.with.nas~., data=ptitanic, method="class"),
    main="observations with missing data", extra=7)
