Given a response and a set of K
features, this function
first runs SMLE (fast=TRUE)
to generate a series of sub-models with
sparsity k
varying from k_min
to k_max
.
It then selects the best model from the series based on a selection criterion.
When criterion EBIC is used, users can choose to repeat the selection with
different values of the tuning parameter, \(\gamma\), and
conduct importance voting for each feature.
smle_select(x, ...)# S3 method for smle
smle_select(x, ...)
# S3 method for sdata
smle_select(
x,
k_min = 1,
k_max = 10,
sub_model = NULL,
gamma_ebic = 0.5,
vote = FALSE,
tune = c("ebic", "aic", "bic"),
codingtype = NULL,
gamma_seq = c(seq(0, 1, 0.2)),
vote_threshold = NULL,
para = FALSE,
num_cores = NULL,
...
)
# S3 method for default
smle_select(x, X = NULL, family = "gaussian", ...)
Object of class 'smle'
or 'sdata'
. Users can also
input a response vector and a feature matrix. See examples
Other parameters.
The lower bound of candidate model sparsity. Default is 1.
The upper bound of candidate model sparsity. Default is as same as the number of columns in input.
A index vector indicating which features (columns of the
feature matrix) are to be selected. Not applicable if a 'smle'
object is the input.
The EBIC parameter in \([0 , 1]\). Default is 0.5.
The logical flag for whether to perform the voting procedure.
Only available when tune ='ebic'
.
Selection criterion. Default is ebic
.
Coding types for categorical features; details see SMLE.
The sequence of values for gamma_ebic when vote =TRUE
.
A relative voting threshold in percentage. A feature is considered to be important when it receives votes passing the threshold.
Logical flag to use parallel computing to do voting selection. Default is FALSE. See Details.
The number of cores to use. The default will be all cores detected.
Input features matrix. When feature matrix input by users.
Model assumption; see SMLE. Default is Gaussian linear.
When input is 'smle'
or 'sdata'
, the same
model will be used in the selection.
Returns a 'selection'
object with
A list of selected features.
Fitted model coefficients based on the selected features.
Values of selection criterion for the candidate models with various sparsity.
A list of Voting selection results; item returned only when
vote==T
.
This functions accepts three types of input for GLMdata;
1. 'smle'
object, as the output from SMLE;
2. 'sdata'
object, as the output from Gen_Data;
3. Other response and feature matrix input by users.
Note that this function is mainly design to conduct an elaborative selection after feature screening. We do not recommend using it directly for ultra-high-dimensional data without screening.
Chen. J. and Chen. Z. (2012). "Extended BIC for small-n-large-P sparse GLM." Statistica Sinica: 555-574.
# NOT RUN {
# This a simple example for Gaussian assumption.
Data<-Gen_Data(correlation="MA",family = "gaussian")
fit<-SMLE(Data$Y,Data$X,k=20,family = "gaussian")
E<-smle_select(fit)
plot(E)
# }
Run the code above in your browser using DataLab