Multiple testing in permutation inference for the general linear model (GLM)
frank.flm(
nsim,
formula.full,
formula.reduced,
curve_sets,
factors = NULL,
savefuns = TRUE,
...,
GET.args = NULL,
mc.cores = 1,
mc.args = NULL,
cl = NULL,
fast = TRUE
)The number of random permutations.
The formula specifying the general linear model,
see formula in lm.
The formula of the reduced model with nuisance factors only. This model should be nested within the full model.
A named list of sets of curves giving the dependent variable (Y), and
possibly additionally all the factors. The dimensions of the elements should
match with each other, i.e. the factor values should be given for each argument value
and each function. If factors are given in the argument factors, then can also be just
the curve set representing Y. Also fdata objects allowed.
A data frame of factors. An alternative way to specify factors when they are constant for all argument values. The number of rows of the data frame should be equal to the number of curves. Each column should specify the values of a factor.
Logical. If TRUE, then the functions from permutations are saved to the attribute simfuns.
Additional arguments to be passed to lm. See details.
A named list of additional arguments to be passed to global_envelope_test.
The number of cores to use, i.e. at most how many child processes will be run simultaneously.
Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1
(no parallelization). For details, see mclapply, for which the argument is passed.
Parallelization can be used in generating simulations and in calculating the second stage tests.
A named list of additional arguments to be passed to mclapply.
Only relevant if mc.cores is more than 1.
Allows parallelization through the use of parLapply (works also
in Windows), see the argument cl there, and examples.
Logical. See details.
A global_envelope object, which can be printed and plotted directly.
The function frank.flm performs
a nonparametric test of significance of a covariate in the functional GLM.
Similarly as in the graphical functional GLM (graph.flm),
the Freedman-Lane algorithm (Freedman and Lane, 1983) is applied to permute the functions
(to obtain the simulations under the null hypothesis of "no effects");
consequently, the test approximately achieves the desired significance level.
In contrast to the graphical functional GLM, the F rank functional GLM is based on the
F-statistics that are calculated at each argument value of the functions.
The global envelope test is applied to the observed and simulated F-statistics.
The test is able to find if the factor of interest is significant and also which
argument values of the functional domain are responsible for the potential rejection.
The specification of the full and reduced formulas is important. The reduced model should be nested within the reduced model. The full model should include in addition to the reduced model the interesting factors whose effects are under investigation. Please avoid use of '*' when specifying interactions, e.g. factor1*factor2; instead explicitly specify all components of the model.
There are different versions of the implementation depending on the application.
Given that the argument fast is TRUE, then
If all the covariates are constant across the functions, i.e. they can be provided in the
argument factors, then a linear model is fitted separately by least-squares estimation to
the data at each argument value of the functions fitting a multiple linear model by lm.
The possible extra arguments passed in ... to lm must be of the form that
lm accepts for fitting a multiple linear model. In the basic case, no extra arguments are
needed.
If some of the covariates vary across the space, i.e. they are provided in the list of curve sets in
the argument curve_sets together with the dependent functions, but there are no extra arguments given
by the user in ..., there is a rather fast implementation of the F-value calculation (which does not
use lm).
If some of the covariates vary across the space and there are user specified extra arguments given in
..., then the implementation fits a linear model at each argument value of the functions using
lm, which can be rather slow. The arguments ... are passed to lm
for fitting each linear model.
By setting fast = FALSE, the latter version is used even in a case where faster implementation would be
available. Usually this is not desired.
Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M. and Narisetty, N. N. (2019) New methods for multiple testing in permutation inference for the general linear model. arXiv:1906.09004 [stat.ME]
Freedman, D., & Lane, D. (1983) A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics, 1(4), 292-298. doi:10.2307/1391660
# NOT RUN {
data(GDPtax)
factors.df <- data.frame(Group = GDPtax$Group, Tax = GDPtax$Profittax)
# }
# NOT RUN {
res.tax_within_group <- frank.flm(nsim = 999,
formula.full = Y~Group+Tax+Group:Tax,
formula.reduced = Y~Group+Tax,
curve_sets = list(Y=GDPtax$GDP),
factors = factors.df)
# }
# NOT RUN {
plot(res.tax_within_group)
# }
Run the code above in your browser using DataLab