nvsd: Nonlinear Variable Selection based on DCOL

Description

This is a nonlinear variable selection procedure for generalized additive models. It's based on DCOL, using forward stagewise selection. In addition, a cross-validation is conducted to tune the stopping alpha level and finalize the variable selection.

Usage

nvsd(X, y, fold = 10, step.size = 0.01, stop.alpha = 0.05, stop.var.count = 20, 
max.model.var.count = 10, roughening.method = "DCOL", do.plot = F, pred.method = "MARS")

Arguments

The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data.

The numerical outcome vector.

fold

The fold of cross-validation.

step.size

The step size of the roughening process.

stop.alpha

The alpha level (significance of the current selected predictor) to stop the iterations.

stop.var.count

The maximum number of predictors to select in the forward stagewise selection. Once this number is reached, the iteration stops.

max.model.var.count

The maximum number of predictors to select. Notice this can be smaller than the stop.var.count. Stop.var.count can be set more liniently, and this parameter controls the final maximum model size.

roughening.method

The method for roughening. The choices are "DCOL" or "spline".

do.plot

Whether to plot the points change in each step.

pred.method

The prediction method for the cross validation variable selection. As forward stagewise procedure doesn't do prediction, a method has to be borrowed from existing packages. The choices include "MARS", "RF", and "SVM".

Value

A list object is returned. The components include the following.

selected.pred

The selected predictors (row number).

all.pred

The selected predictors by the forward stagewise selection. The $selected.pred is a subset of this.

Details

Please refer to the reference for details.

References

https://arxiv.org/abs/1601.05285

Examples

Run this code

# NOT RUN {
X<-matrix(rnorm(2000),ncol=20)
y<-sin(X[,1])+X[,2]^2+X[,3]
nvsd(t(X),y,stop.alpha=0.001,step.size=0.05)
# }

Run the code above in your browser using DataLab