The dataset was compiled by Frank Harrell and Robert Dawson:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html. See also:
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/titanic3info.txt. For this version of the Titanic data, passenger details were deleted,
survived
was cast as a factor
,
and the name changed to ptitanic
to minimize confusion with
other versions. In this data the crew are conspicuous by their absence. Contents of ptitanic
:
pclass survived sex age sibsp parch
1 1st survived female 29.000 0 0
2 1st survived male 0.917 1 2
3 1st died female 2.000 1 2
4 1st died male 30.000 1 2
5 1st died female 25.000 1 2
...
1309 3rd died male 29.000 0 0
How ptitanic
was built:
load("titanic3.sav") # from Dr. Harrell's web site
# discard name, ticket, fare, cabin, embarked, body, home.dest
ptitanic <- titanic3[,c(1,2,4,5,6,7)]
# change survived from integer to factor
ptitanic$survived <- factor(ptitanic$survived, labels=c("died", "survived"))
save(ptitanic, file="ptitanic.rda")
This version of the data differs from
etitanic
in the earth
package
in that here survived
is a factor (not an integer)
and age
has some NA
s.