This is a nonlinear variable selection procedure for generalized additive models. It's based on DCOL, using forward stagewise selection. In addition, a cross-validation is conducted to tune the stopping alpha level and finalize the variable selection.
nvsd(X, y, fold = 10, step.size = 0.01, stop.alpha = 0.05, stop.var.count = 20,
max.model.var.count = 10, roughening.method = "DCOL", do.plot = F, pred.method = "MARS")
The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data.
The numerical outcome vector.
The fold of cross-validation.
The step size of the roughening process.
The alpha level (significance of the current selected predictor) to stop the iterations.
The maximum number of predictors to select in the forward stagewise selection. Once this number is reached, the iteration stops.
The maximum number of predictors to select. Notice this can be smaller than the stop.var.count. Stop.var.count can be set more liniently, and this parameter controls the final maximum model size.
The method for roughening. The choices are "DCOL" or "spline".
Whether to plot the points change in each step.
The prediction method for the cross validation variable selection. As forward stagewise procedure doesn't do prediction, a method has to be borrowed from existing packages. The choices include "MARS", "RF", and "SVM".
A list object is returned. The components include the following.
The selected predictors (row number).
The selected predictors by the forward stagewise selection. The $selected.pred is a subset of this.
Please refer to the reference for details.
https://arxiv.org/abs/1601.05285
stage.forward
# NOT RUN {
X<-matrix(rnorm(2000),ncol=20)
y<-sin(X[,1])+X[,2]^2+X[,3]
nvsd(t(X),y,stop.alpha=0.001,step.size=0.05)
# }
Run the code above in your browser using DataLab