Given a response and a set of K
features, this function first runs SMLE (fast=TRUE)
to generate a series of sub-models with sparsity k
varying from k_min
to k_max
.
It then selects the best model from the series based on a selection criterion.
When criterion EBIC is used, users can choose to repeat the selection with different values of the tuning parameter, gamma, and conduct importance voting for each feature.
smle_select(x, ...)# S3 method for smle
smle_select(x, ...)
# S3 method for sdata
smle_select(x, k_min = 1, k_max = 10,
sub_model = NULL, gamma_ebic = 0.5, vote = FALSE,
tune = c("ebic", "aic", "bic"), gamma_seq = c(seq(0, 1, 0.2)),
vote_threshold = NULL, para = FALSE, num_cores = NULL, ...)
# S3 method for default
smle_select(x, X = NULL, family = "gaussian", ...)
Object of class "smle" or "sdata", or directly input data pair (Y,X).
Other parameters.
The lower bound of target model sparsity. Default is 1.
The upper bound of target model sparsity. Default is as same as the number of columns in input.
A subset of columns indicating that which columns are able to be selected.(Only for object of "sdata" and (Y,X) pair)
Parameter for Extended Bayesian Information Criteria. Must be v between (0, 1). Default is 0.5.
The logical flog for whether to perform the voting procedure. Only available when tune ='ebic'
.
Selection criterion, must bu one of 'aic','bic', or 'ebic'
. Default is 'ebic'.
The sequence of values for gamma_ebic when vote =TRUE
.
A relative voting threshold in percentage. A feature is considered to be important when it receives votes passing the threshold.
Logical flag to use parallel computing to do voting selection. Default is FALSE. see Details.
The number of cores to use. The default will be all cores detected.
Input features matrix.
Response type (see SMLE); default is gaussian. When input object is smle or sdata, the same model will be used in the selection step.
Returns a "selection"
object with
A list of varible IDs selected.
A list of coefficients for selected features fit by glmnet
A list of value according to selected criteria and model sparisity.
A list of Voting selection results; item returned only when vote==T
There are three types of input allowed: Object with class "smle", the output from main function SMLE; Object with class "sdata", the ouput from Gen_Data; Input data pair directly by Y, X. It is not recommender to use object of type sdata or the data matrices X,Y for high demensional data.
Chen. J. and Chen. Z. (2012). "Extended BIC for small-n-large-P sparse GLM." Statistica Sinica: 555-574.
Chen. J. and Chen. Z. (2008). "Extended Bayesian information criteria for model selection with large model spaces." Biometrika 95.3: 759-771.
Chen, Z. and Chen. J. (2009). "Tournament screening cum EBIC for feature selection with high-dimensional feature spaces." Science in China Series A: Mathematics 52.6 : 1327-1341.
# NOT RUN {
# This a simple example for Gaussian assumption.
Data<-Gen_Data(correlation="MA",family = "gaussian")
fit<-SMLE(Data$Y,Data$X,k=20,family = "gaussian")
E<-smle_select(fit)
plot(E)
# }
Run the code above in your browser using DataLab