SIS.selection: Sure Independence Screening

Description

SIS has been performed to select relevant gene expression variables. SIS ranks the importance of features according to their magnitude of marginal regression coefficients.

Usage

SIS.selection(X,Y, pred, scale = F)

Arguments

a data matrix (nxp) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.

a vector of length n giving the classes of the n observations. The classes must be coded as 1 or 0.

pred

number of relevant variable to select, pred has to be lower than p.

scale

If scale=TRUE, X will be scaled.

Value

Return a matrix (nxpred) with only the pred most relevant gene and all the observations

Details

Sure Independence Screening (SIS) has been performed to select relevant gene expression variables pred such as pred < p. SIS refers to ranking features according to marginal utility, namely, each feature is used independently as a predictor to decide its usefulness for predicting the response. Precisely SIS ranks the importance of features according to their magnitude of marginal regression coefficients.

References

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, 70, 849-911.

Examples

Run this code

# NOT RUN {
data("BreastCancer")
X<-scale(BreastCancer$X)
Y<-BreastCancer$Y
# }
# NOT RUN {
Xsis<-SIS.selection(X,Y,50)
# }

Run the code above in your browser using DataLab