
Perform prototype or F tests for significance of groups of predictors in the multivariate model. Choose either exact or approximate likelihood ratio prototype tests (ELR) or (ALR) or F test or marginal screening prototype test. Options for selective or non-selective tests. Further options for non-sampling or hit-and-run reference distributions for selective tests.
prototest.multivariate(x, y, groups, test.group, type = c("ELR", "ALR", "F", "MS"),
selected.col = NULL, lambda, mu = NULL, sigma = 1,
hr.iter = 50000, hr.burn.in = 5000, verbose = FALSE, tol = 10^-8)
input matrix of dimension n-by-p, where p is the number of predictors over all predictor groups of interest. Will be mean centered and standardised before tests are performed.
response variable. Vector of length n, assumed to be quantitative.
group membership of the columns of x
. Vector of length p, which each element containing the goup label of the corresponding column in x
.
group label for which we test nullity. Should be one of the values seen in groups
. See Details for further explanation.
type of test to be performed. Can select one at a time. Options include the exact and approximate likelihood ratio prototype tests of Reid et al (2015) (ELR, ALR), the F test and the marginal screening prototype test of Reid and Tibshirani (2015) (MS). Default is ELR.
preselected columns selected by the user. Vector of indices in the set {1, 2, ... p}. Used in conjunction with groups
to ascertain for which groups the user has specified selected columns. Should it find any selected columns within a group, no further action is taken to select columns. Should no columns within a group be specified, columns are selected using either lasso or the marginal screening procedure, depending on the test. If all groups have prespecified columns, a non-selective test is performed, using the classical distributional assumptions (exact and/or asymptotic) for the test in question. If any selection is performed, selective tests are performed. Default is NULL
, requiring the selection of columns in all the groups.
regularisation parameter for the lasso fit. Same for each group. Must be supplied when at least one group has unspecified columns in selected.col
. Will be supplied to glmnet
. This is the unstandardised version, equivalent to lambda
/n
supplied to glmnet
.
mean parameter for the response. See Details below. If supplied, it is first subtracted from the response to yield a zero-mean (at the population level) vector for which we proceed with testing. If NULL
(the default), this parameter is treated as nuisance parameter and accounted for as such in testing.
error standard deviation for the response. See Details below. Must be supplied. If not, it is assumed to be 1. Required for computation of some of the test statistics.
number of hit-and-run samples required in the reference distribution of the a selective test. Applies only if selected.col
is NULL
. Default is 50000. Since dependent samples are generated, large values are required to generate good reference distributions. If set to 0, the function tries to applu a non-sampling selective test (provided selected.col
is NULL
), if possible. If non-sampling test is not possible, the function exits with a message.
number of burn-in hit-and-run samples. These are generated first so as to make subsequent hit-and-run realisations less dependent on the observed response. Samples are then discarded and do not inform the null reference distribution.
should progress be printed?
convergence threshold for iterative optimisation procedures.
A list with the following four components:
The value of the test statistic on the observed data.
Valid p-value of the test.
Vector with columns selected for prototype formation in the test. If initially NULL
, this will now contain indices of columns selected by the automatic column selection procedures of the test.
Matrix with hit-and-run replications of the response. If sampled selective test was not performed, this will be NULL
.
The model underpinning each of the tests is
In particular, for the ELR, ALR and F tests, we have selected.col
) or is selected automatically (if selected.col
is NULL
). If the former, a non-selective test is performed; if the latter, a selective test is performed, with the restrictions
For the marginal screening prototype (MS) test, x
and
All tests test the null hypothesis test.group
. Details of each are described in Reid et al (2015).
Reid, S. and Tibshirani, R. (2015) Sparse regression and marginal testing using cluster prototypes. http://arxiv.org/pdf/1503.00334v2.pdf. Biostatistics 10.1093/biostatistics/kxv049 Reid, S., Taylor, J. and Tibshirani, R. (2015) A general framework for estimation and inference from clusters of features. Available online: http://arxiv.org/abs/1511.07839.
# NOT RUN {
require (prototest)
### generate data
set.seed (12345)
n = 100
p = 80
X = matrix (rnorm(n*p, 0, 1), ncol=p)
beta = rep(0, p)
beta[1:3] = 0.1 # three signal variables: number 1, 2, 3
signal = apply(X, 1, function(col){sum(beta*col)})
intercept = 3
y = intercept + signal + rnorm (n, 0, 1)
### treat all columns as if in same group and test for signal
# non-selective ELR test with nuisance intercept
elr = prototest.univariate (X, y, "ELR", selected.col=1:5)
# selective F test with nuisance intercept; non-sampling
f.test = prototest.univariate (X, y, "F", lambda=0.01, hr.iter=0)
print (elr)
print (f.test)
### assume variables occur in 4 equally sized groups
num.groups = 4
groups = rep (1:num.groups, each=p/num.groups)
# selective ALR test -- select columns 21-25 in 2nd group; test for signal in 1st; hit-and-run
alr = prototest.multivariate(X, y, groups, 1, "ALR", 21:25, lambda=0.005, hr.iter=20000)
# non-selective MS test -- specify first column in each group; test for signal in 1st
ms = prototest.multivariate(X, y, groups, 1, "MS", c(1,21,41,61))
print (alr)
print (ms)
# }
Run the code above in your browser using DataLab