Learn R Programming

varEst (version 0.1.0)

krcv: Variance Estimation with kfold-RCV

Description

Estimation of error variance using k-fold refitted cross validation in ultrahigh dimensional dataset.

Usage

krcv(x,y,a,k,d,method=c("spam","lasso","lsr"))

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

a

value of alpha, range is 0<=a<=1 where, a=1 is LASSO penalty and a=0 is Ridge penalty.If variable selection method is LASSO then providing value to a is compulsory. For other methods a should be NULL.

k

dataset is divided into this many numbers of sub-datasets.

d

number of variables to be selected from x.

method

variable selection method, user can choose any method among "spam", "lasso", "lsr"

Value

Error variance

Details

The error variance is estimated from a high dimensional datasets where number of parameters are more than number of individuals, i.e. p > n.k-fold RCV is an extended version of original RCV method (Fan et al., 2012). In this case the datasets are divided into k equal size groups instead of 2 groups. Variables are selected using Sparse Additive Models (SpAM) or LASSO or least squared regression (lsr) from one group and variance is estimated using selected variables with ordinary least squared estimation from rest of the k-1 groups. Likewise, all the groups are covered and in the end, average value of all the variances from each group is the final error variance.

References

Fan, J., Guo, S., Hao, N. (2012).Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society, 74(1), 37-65 Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 1009-1030 Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of Royal Statistical Society, 58, 267-288

Examples

Run this code
# NOT RUN {
## data simulation
marker <- as.data.frame(matrix(NA, ncol =500, nrow = 200))
for(i in 1:500){
marker[i] <- sample(1:3, 200, replace = TRUE, prob = c(1, 2, 1))
}
pheno <- marker[,1]*1.41+marker[,2]*1.41+marker[,3]*1.41+marker[,4]*1.41+marker[,5]*1.41

pheno <- as.matrix(pheno)
marker<- as.matrix(marker)

## estimation of error variance
var <- krcv(marker,pheno,1,4,5,"spam")
# }

Run the code above in your browser using DataLab