Calculate p-values and confidence intervals based on the multi-splitting approach
multi.split(x, y, B = 100, fraction = 0.5, ci = TRUE, ci.level = 0.95,
model.selector = lasso.cv,
classical.fit = lm.pval, classical.ci = lm.ci,
parallel = FALSE, ncores = getOption("mc.cores", 2L),
gamma = seq(ceiling(0.05 * B) / B, 1 - 1 / B, by = 1 / B),
args.model.selector = NULL, args.classical.fit = NULL,
args.classical.ci = NULL,
return.nonaggr = FALSE, return.selmodels = FALSE,
repeat.max = 20,
verbose = FALSE)
numeric design matrix (without intercept).
numeric response vector.
the number of sample-splits, a positive integer.
a number in
logical indicating if a confidence interval should be calculated for each parameter.
(if ci
is true:) a number in
a function
to perform model
selection, with default lasso.cv
. The function must have at
least two arguments, x
(the design matrix) and y
(the
response vector). Return value is the index vector of selected columns. See
lasso.cv
and lasso.firstq
for an
example. Additional arguments can be passed via args.model.selector
.
a function
to calculate (classical)
confidence intervals. Default is lm.ci
. The function
must have at least 3 arguments, x
(the design matrix),
y
(the response vector) and level
(the coverage
level), and return the matrix of confidence intervals. See
lm.ci
for an example. Additional arguments can be
passed through args.classical.ci
.
logical indicating if parallelization via
mclapply
should be used.
number of cores used for parallelization as
mc.cores
in mclapply()
.
vector of gamma-values. In case gamma is a scalar, the
value
named list
of further arguments for
function model.selector
.
named list
of further
arguments for function classical.fit
.
named list
of further arguments
for function classical.ci
.
logical
indicating if the
unadjusted p-values be returned.
logical
indicating if the
selected models (at each split) should be returned. Necessary for
the clusterGroupTest()
part of the result.
positive integer indicating the maximal number of split trials. Should not matter in regular cases, but necessary to prevent infinite loops in borderline cases.
should information be printed out while computing? (logical).
Vector of multiple testing corrected p-values.
Value of gamma where minimal p-values was attained.
Function to perform groupwise tests based on
hierarchical clustering. You can either provide a distance matrix
and clustering method or the output of hierarchical clustering from
the function hclust
as for
clusterGroupBound
. P-values are adjusted for multiple
testing.
Meinshausen, N., Meier, L. and B<U+00FC>hlmann, P. (2009) P-values for high-dimensional regression. Journal of the American Statistical Association 104, 1671--1681.
Mandozzi, J. and B<U+00FC>hlmann, P. (2015) A sequential rejection testing method for high-dimensional regression with correlated variables. To appear in the International Journal of Biostatistics. Preprint arXiv:1502.03300
# NOT RUN {
n <- 40 # a bit small, to keep example "fast"
p <- 256
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- x[,1] * 2 + x[,2] * 2.5 + rnorm(n)
## Multi-splitting with lasso.firstq as model selector function
## 'q' must be specified
fit.multi <- multi.split(x, y, model.selector = lasso.firstq,
args.model.selector = list(q = 10))
fit.multi
head(fit.multi$pval.corr, 10) ## the first 10 p-values
ci. <- confint(fit.multi)
head(ci.) # the first 6
stopifnot(all.equal(ci.,
with(fit.multi, cbind(lci, uci)), check.attributes=FALSE))
# }
# NOT RUN {
<!-- %% "too" expensive; still run via --run-donttest -->
## Use default 'lasso.cv' (slower!!) -- incl cluster group testing:
system.time(fit.m2 <- multi.split(x, y, return.selmodels = TRUE))# 9 sec (on "i7")
head(fit.m2$pval.corr) ## the first 6 p-values
head(confint(fit.m2)) ## the first 6 95% conf.intervals
## Now do clustergroup testing
clGTst <- fit.m2$clusterGroupTest
names(envGT <- environment(clGTst))# about 14
if(!interactive()) # if you are curious (and advanced):
print(ls.str(envGT), max = 0)
stopifnot(identical(clGTst, envGT$clusterGroupTest))
ccc <- clGTst()
str(ccc)<!-- %% MM: hmm.. -->
ccc$hh # the clustering
has.1.or.2 <- sapply(ccc$clusters,
function(j.set) any(c(1,2) %in% j.set))
ccc$pval[ has.1.or.2] ## all very small
ccc$pval[!has.1.or.2] ## all 1
# }
# NOT RUN {
<!-- % dont... -->
# }
Run the code above in your browser using DataLab