Learn R Programming

VariableScreening (version 0.2.0)

screenVCM: Perform screening for ultrahigh-dimensional varying coefficient model

Description

Implements a screening procedure proposed by Liu, Li and Wu(2014) for varying coefficient models with ultra-high dimensional predictors.

The function code is adapted from the relevant authors' code. Special thanks are due to Jingyuan Liu for providing some of the code upon which this function is based.

Usage

screenVCM(X, Y, U)

Arguments

X

Matrix of predictors to be screened. There should be one row for each observation.

Y

Vector of responses. It should have the same length as the number of rows of X.

U

Covariate, with which coefficient functions vary.

Value

A list with following components:

CORR_sq:

A vector of the unconditioned squared correlation with length equal to the number of columns in the input matrix X. The hgh the unconditioned squared correlation is, the more desirable it is to retain the corresponding X covariate in a later predictive model.

rank:

Vector for the rank of the predictors in terms of the conditional correlation ( \(\hat{rho}*_j\) in the paper). This will have length equal to the number of columns in the input matrix X, and will consist of a permutation of the integers 1 through that length. A rank of 1 indicates the feature which appears to have the best marginal predictive performance with largest \(\hat{rho}*_j\), 2 represents the second best and so forth.

References

Liu, J., Li, R., & Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109: 266-274. <DOI:10.1080/01621459.2013.850086>

Examples

Run this code
# NOT RUN {
set.seed(12345678)
data1 <- simulateVCM(p=250,trueIdx = c(2, 20, 80, 120, 200))
screenResults<- screenVCM(X = data1$X, Y = data1$Y, U = data1$U)
rank <- screenResults$rank
print(which(rank <= 10))    # This prints the numbers of the columns having the best rank.
trueIdx <- c(2, 20, 80, 120, 200)   # These were the column numbers (indices) of
                                     # the nonnull predictors in the simulated data;
print(rank[trueIdx])    # It can be seen that these predictors were all given high rank,
                        # showing that the function worked well.;
# }

Run the code above in your browser using DataLab