Learn R Programming

hit (version 0.2-2)

hit: Hierarchical Inference Testing

Description

Hierarchical inference testing for linear models with high-dimensional and/or correlated covariates by repeated sample splitting.

Usage

hit(x, y, hierarchy, family = "gaussian", B = 50, p.samp1 = 0.35, nfolds = 10, lambda.opt = "lambda.1se", alpha = 1, gamma = seq(0.05, 0.99, length.out = 100), max.p.esti = 1, mc.cores = 1L, trace = FALSE, ...)

Arguments

x
Design matrix of dimension n * p, without intercept. Variables not part of the dendrogram are added to the HO-model, see Details below.
y
Quantitative response variable dimension n.
hierarchy
Object of class as.hierarchy. Must include all variables of x which should be tested.
family
Family of response variable distribution. Ether y is "gaussian" in which case y must be a vector or it is "binomial" distibuded. In this case y should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For "binomial" if y is presented as a vector, it will be coerced into a factor.
B
Number of sample-splits.
p.samp1
Fraction of data used for the LASSO. The hierachical ANOVA testing uses 1 - p.samp1.
nfolds
Number of folds (default is 10). See cv.glmnet for more details.
lambda.opt
Criterion for optimum selection of cross-validated lasso. Either "lambda.1se" (default) or "lambda.min". See cv.glmnet for more details.
alpha
A single value or a vector of values in the range of 0 to 1 for the elastic net mixing parameter. If more than one value are given, the best is selected during cross-validation.
gamma
Vector of gamma-values.
max.p.esti
Maximum alpha level. All p-values above this value are set to one. Small max.p.esti values reduce computing time.
mc.cores
Number of cores for parallelising. Theoretical maximum is 'B'. For details see mclapply.
trace
If TRUE it prints current status of the program.
...
Additional arguments for cv.glmnet.

Details

The H0-model contains variables, with are not tested, like experimental-design variables. These variables are not penalised in the LASSO model selection and are always include in the reduced ANOVA model.

References

Mandozzi, J. and Buehlmann, P. (2013). Hierarchical testing in the high-dimensional setting with correlated variables. To appear in the Journal of the American Statistical Association. Preprint arXiv:1312.5556

Examples

Run this code

# Simulation:
set.seed(123)
n <- 80
p <- 82
## x with correlated columns
corMat <- toeplitz((p:1/p)^5)
corMatQ <- chol(corMat)
x <- matrix(rnorm(n * p), nrow = n) %*% corMatQ
colnames(x) <- paste0("x", 1:p)
## y
mu <- x[, c(5, 24, 72)] %*% c(3, 1, 2)
y <-  rnorm(n, mu)
## clustering of the clumns of x
hc <- hclust(dist(t(x)))

# HIT with AF
out <- hit(x, y, hc)
summary(out)

Run the code above in your browser using DataLab