Learn R Programming

genridge (version 0.6-2)

Detroit: Detroit Homicide Data for 1961-1973

Description

The data set Detroit was used extensively in the book by Miller (2002) on subset regression. The data are unusual in that a subset of three predictors can be found which gives a very much better fit to the data than the subsets found from the Efroymson stepwise algorithm, or from forward selection or backward elimination. They are also unusual in that, as time series data, the assumption of independence is patently violated, and the data suffer from problems of high collinearity. As well, ridge regression reveals somewhat paradoxical paths of shrinkage in univariate ridge trace plots, that are more comprehensible in multivariate views.

Usage

data(Detroit)

Arguments

docType

data

source

http://lib.stat.cmu.edu/datasets/detroit

Details

The data were orginally collected and discussed by Fisher (1976) but the complete dataset first appeared in Gunst and Mason (1980, Appendix A). Miller (2002) discusses this dataset throughout his book, but doeesn't state clearly which variables he used as predictors and which is the dependent variable. (Homicide was the dependent variable, and the predictors were Police ...WkEarn.) The data were obtained from StatLib. A similar version of this data set, with different variable names appears in the bestglm package.

References

Fisher, J.C. (1976). Homicide in Detroit: The Role of Firearms. Criminology, 14, 387--400. Gunst, R.F. and Mason, R.L. (1980). Regression analysis and its application: A data-oriented approach. Marcel Dekker. Miller, A. J. (2002). Subset Selection in Regression. 2nd Ed. Chapman & Hall/CRC. Boca Raton.

Examples

Run this code
data(Detroit)

# Work with a subset of predictors, from Miller (2002, Table 3.14),
# the "best" 6 variable model
#    Variables: Police, Unemp, GunLic, HClear, WhMale, WkEarn
# Scale these for comparison with other methods

Det <- as.data.frame(scale(Detroit[,c(1,2,4,6,7,11)]))
Det <- cbind(Det, Homicide=Detroit[,"Homicide"])

# use the formula interface; specify ridge constants in terms
# of equivalent degrees of freedom
dridge <- ridge(Homicide~., data=Det, df=seq(6,4,-.5))

# univariate trace plots are seemingly paradoxical in that
# some coefficients "shrink" *away* from 0
traceplot(dridge, X="df")
vif(dridge)
pairs(dridge, radius=0.5)

plot3d(dridge, radius=0.5, labels=dridge$df)

# transform to PCA/SVD space
dpridge <- pca.ridge(dridge)
# not so paradoxical in PCA space
traceplot(dpridge, X="df")
biplot(dpridge, radius=0.5)

Run the code above in your browser using DataLab