step.plr: Forward stepwise selection procedure for penalized logistic regression

Description

This function fits a series of L2 penalized logistic regression models selecting variables through the forward stepwise selection procedure.

Usage

step.plr(x, y, weights = rep(1,length(y)), fix.subset = NULL,
           level = NULL, lambda = 1e-4, cp = "bic",
           max.terms = 5, type = c("both", "forward"), trace = FALSE)

Arguments

Value

A stepplr object is returned. anova, predict, print, and summary functions can be applied.fita plr object for the optimal model selectedactiona list that stores the selection order of the terms in the optimal model.action.namea list of the names of the sequentially added terms - in the same order as in actiondeviancedeviance of the fitted modeldfresidual degrees of freedom of the fitted modelscoredeviance + cp*df, where df is the model degrees of freedomgroupa vector of the counts for the dummy variables, to be used in predict.stepplryresponse variable usedweightweights usedfix.subsetfix.subset usedlevellevel usedlambdalambda usedcpcomplexity parameter used when computing the scoretypetype usedxnamescolumn names of x

Details

This function implements an L2 penalized logistic regression along with the stepwise variable selection procedure, as described in "Penalized Logistic Regression for Detecting Gene Interactions (2006)" by Park and Hastie.

If type="forward", max.terms terms are sequentially added to the model, and the model that minimizes score is selected as the optimal fit. If type="both", a backward deletion is done in addition, which provides a series of models with a different combination of the selected terms. The optimal model minimizing score is chosen from the second list. We thank Michael Saunders of SOL, Stanford University for providing the solver used for the convex optimization in this function.

References

Mee Young Park and Trevor Hastie (2006) Penalized Logistic Regression for Detecting Gene Interactions - available at the authors' websites, http://stat.stanford.edu/~mypark or http://stat.stanford.edu/~hastie/pub.htm.

Examples

Run this code

n <- 100

p <- 3
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x <- data.frame(x1=factor(z[ ,1]),x2=factor(z[ ,2]),x3=factor(z[ ,3]))
y <- sample(c(0,1),n,replace=TRUE)
fit <- step.plr(x,y)
# 'level' is automatically generated. Check 'fit$level'.

p <- 5
x <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x <- cbind(rnorm(n),x)
y <- sample(c(0,1),n,replace=TRUE)
level <- vector("list",length=6)
for (i in 2:6) level[[i]] <- seq(3)
fit1 <- step.plr(x,y,level=level,cp="aic")
fit2 <- step.plr(x,y,level=level,cp=4)
fit3 <- step.plr(x,y,level=level,type="forward")
fit4 <- step.plr(x,y,level=level,max.terms=10)
# This is an example in which 'level' was input manually.
# level[[1]] should be either 'NULL' or 'NA' since the first factor is continuous.

Run the code above in your browser using DataLab