Learn R Programming

sodavis (version 0.1)

soda:

Description

SODA is a forward-backward variable and interaction selection algorithm under logistic regression model with second-order terms. In the forward stage, a stepwise procedure is conducted to screen for important predictors with both main and interaction effects, and in the backward stage SODA remove insignificant terms so as to optimize the extended BIC (EBIC) criterion. SODA is applicable for variable selection for logistic regression, linear/quadratic discriminant analysis and other discriminant analysis with generative model being in exponential family.

Usage

soda(xx, yy, norm = FALSE, debug = FALSE, gam = 0, minF = 3)

Arguments

xx
The design matrix, of dimensions n * p, without an intercept. Each row is an observation vector.
yy
The response vector of dimension n * 1.
norm
Logical flag for xx variable quantile normalization to standard normal, prior to performing SODA algorithm. Default is norm=FALSE. Quantile-normalization is suggested if the data contains obvious outliers.
debug
Logical flag for printing debug information.
gam
Tuning paramter gamma in extended BIC criterion.

EBIC for selected set S:

EBIC = -2 * log-likelihood + |S| * log(n) + 2 * |S| * gamma * log(p)

minF
Minimum number of steps in forward interaction screening. Default is minF=3.

Value

EBIC
Trace of extended Bayesian information criterion (EBIC) score.
Type
Trace of step type ("Forward (Main)", "Forward (Int)", "Backward").
Var
Trace of selected variables.
Term
Trace of selected main and interaction terms.
final_EBIC
Final selected term set EBIC score.
final_Var
Final selected variables.
final_Term
Final selected main and interaction terms.

References

Li Y, Liu JS. (2015). Robust variable and interaction selection for high-dimensional classification via logistic regression. Technical Report.

Examples

Run this code
# simulation study with 1 main effect and 2 interactions (uncomment the code to run)
#N = 250;
#p = 1000;
#r = 0.5;
#s = 1;
#H = abs(outer(1:p, 1:p, "-"))
#S = s * r^H;
#S[cbind(1:p, 1:p)] = S[cbind(1:p, 1:p)] * s

#xx = as.matrix(data.frame(mvrnorm(N, rep(0,p), S)));
#zz = 1 + xx[,1] - xx[,10]^2 + xx[,10]*xx[,20];
#yy = as.numeric(runif(N) < exp(zz) / (1+exp(zz)))

#res_SODA = soda(xx, yy, gam=0.5);
#cv_SODA  = soda_trace_CV(xx, yy, res_SODA)
#cv_SODA

# Michigan lung cancer dataset (uncomment the code to run)
#data(mich_lung);
#res_SODA = soda(mich_lung_xx, mich_lung_yy, gam=0.5);
#cv_SODA  = soda_trace_CV(mich_lung_xx, mich_lung_yy, res_SODA)
#cv_SODA

Run the code above in your browser using DataLab