This function fits a series of L2 penalized logistic regression models selecting variables through the forward stepwise selection procedure.

```
step.plr(x, y, weights = rep(1,length(y)), fix.subset = NULL,
level = NULL, lambda = 1e-4, cp = "bic", max.terms = 5,
type = c("both", "forward", "forward.stagewise"),
trace = FALSE)
```

x

matrix of features

y

binary response

weights

optional vector of weights for observations

fix.subset

vector of indices for the variables that are forced to be in the model

level

list of length `ncol(x).`

The j-th element corresponds to
the j-th column of `x.`

If the j-th column of `x`

is
discrete, `level[[j]]`

is the set of levels for the
categorical factor. If the j-th column of `x`

is continuous,
`level[[j]] = NULL.`

`level`

is automatically
generated in the function; however, if any levels of the
categorical factors are not observed, but still need to be included
in the model, then the user must provide the complete sets of the
levels through `level.`

If a numeric column needs to be
considered discrete, it can be done by manually providing
`level`

as well.

lambda

regularization parameter for the L2 norm of the
coefficients. The minimizing criterion in `plr`

is
-log-likelihood+\(\lambda*\|\beta\|^2\). Default is
`lambda=1e-4.`

cp

complexity parameter to be used when computing the
score. `score=deviance+cp*df.`

If `cp="aic"`

or
`cp="bic",`

these are converted to `cp=2`

or
`cp=log(sample size),`

respectively. Default is
`cp="bic".`

max.terms

maximum number of terms to be added in the forward selection
procedure. Default is `max.terms=5.`

type

If `type="both",`

forward selection is followed by a backward
deletion. If `type="forward",`

only a forward selection is
done. If `type="forward.stagewise",`

variables are added in
the forward-stagewise method. Default is `"both".`

trace

If `TRUE,`

the variable selection procedure prints out its
progress.

A `stepplr`

object is returned. `anova, predict, print,`

and
`summary`

functions can be applied.

`plr`

object for the optimal model selected

list that stores the selection order of the terms in the optimal model

list of the names of the sequentially added terms - in the same
order as in `action`

deviance of the fitted model

residual degrees of freedom of the fitted model

deviance + cp*df, where df is the model degrees of freedom

vector of the counts for the dummy variables, to be used in
`predict.stepplr`

response variable used

weights used

fix.subset used

level used

lambda used

complexity parameter used when computing the score

type used

column names of `x`

This function implements an L2 penalized logistic regression along with the stepwise variable selection procedure, as described in "Penalized Logistic Regression for Detecting Gene Interactions (2008)" by Park and Hastie.

If `type="forward",`

`max.terms`

terms are sequentially
added to the model, and the model that minimizes `score`

is
selected as the optimal fit. If `type="both",`

a backward
deletion is done in addition, which provides a series of models with a
different combination of the selected terms. The optimal model
minimizing `score`

is chosen from the second list.

Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions

cv.step.plr, plr, predict.stepplr

# NOT RUN { n <- 100 p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- step.plr(x, y) # 'level' is automatically generated. Check 'fit$level'. p <- 5 x <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- cbind(rnorm(n), x) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=6) for (i in 2:6) level[[i]] <- seq(3) fit1 <- step.plr(x, y, level=level, cp="aic") fit2 <- step.plr(x, y, level=level, cp=4) fit3 <- step.plr(x, y, level=level, type="forward") fit4 <- step.plr(x, y, level=level, max.terms=10) # This is an example in which 'level' was input manually. # level[[1]] should be either 'NULL' or 'NA' since the first factor is continuous. # }