SMLE (version 0.3.1)

smle-package: Joint SMLE-screening for generalized linear models

Description

Feature screening is a powerful tool in processing ultra-high dimensional data. It attempts to screen out most irrelevant features before an elaborative analysis. This package provides an efficient implementation of SMLE-screening for linear, logistic, and Poisson models, where joint effects among features are naturally incorporated in the screening process. The package also provides a function for conducting feature selection based on user-specified selection criterion after screening.

Arguments

Details

Package: smle
Type: Package
Version: 0.2
Date: 2020-01-29
License: GPL-2

Input a \(n \times 1\) response vector Y and a \(n \times p\) predictor (feature) matrix X. The package outputs a set of \(k < n\) predictors that seem to be most relevant for joint regression. Moreover, the package provides a data simulator that generates a synthetic data set from high-dimensional GLMs, which accommodates commonly used correlation structures among numerical and categorical features.

Important functions: Gen_Data SMLE smle_select smle_predict

References

Xu, C. and Chen, J. (2014) The Sparse MLE for Ultrahigh-Dimensional Feature Screening Journal of the American Statistical Association,109:507, pages:1257-1269.

Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent Journal of Statistical Software,02,33.

Examples

Run this code
# NOT RUN {
set.seed(123.456)
#Generate correlated data
Data<-Gen_Data(correlation="MA",family = "gaussian")
print(Data)

# joint feature screening via SMLE
fit<-SMLE(Data$Y,Data$X,k=10,family = "gaussian")
print(fit)
plot(fit)

#Are there any features missed after screening?
setdiff(Data$index, fit$Retained_Feature_IDs)

# Elaborative selection after screening
E<-smle_select(fit,gamma_ebic = 0.5,vote = FALSE)

#Are there any features missed after selection?
setdiff( Data$index ,E$Retained_Feature_IDs)
print(E)
plot(E)
# }

Run the code above in your browser using DataLab