GET-package: GET: Global Envelopes in R

Description

The GET package provides global envelopes which can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).

Arguments

Key functions in <span class="pkg">GET</span>

Central regions or global envelopes or confidence bands: central_region. E.g. 50% central region of growth curves of girls growth.
- First create a curve_set of the growth curves, e.g.
  
  cset <- create_curve_set(list(r = as.numeric(row.names(growth$hgtf)), obs = growth$hgtf))
- Then calculate 50% central region (see central_region for further arguments)
  
  cr <- central_region(cset, coverage = 0.5)
- Plot the result (see plot.global_envelope for plotting options)
  
  plot(cr)
It is also possible to do combined central regions for several sets of curves provided in a list for the function, see examples in central_region.
Global envelope tests: global_envelope_test is the main function. E.g. A test of complete spatial randomness (CSR) for a point pattern X:
X <- spruces # an example pattern from spatstat
- Use envelope to create nsim simulations under CSR and to calculate the functions you want (below K-functions by Kest). Important: use the option 'savefuns=TRUE' and specify the number of simulations nsim.
  
  env <- envelope(X, nsim=999, savefuns=TRUE, fun=Kest, simulate=expression(runifpoint(ex=X)))
- Perform the test (see global_envelope_test for further arguments)
  
  res <- global_envelope_test(env)
- Plot the result (see plot.global_envelope for plotting options)
  
  plot(res)
It is also possible to do combined global envelope tests for several sets of curves provided in a list for the function, see examples in global_envelope_test.

Functional ordering: central_region and global_envelope_test are based on different measures for ordering the functions (or vectors) from the most extreme to the least extreme ones. The core functionality of calculating the measures is in the function forder, which can be used to obtain different measures for sets of curves. Usually there is no need to call forder directly.
Functional boxplots: fBoxplot
Adjusted global envelope tests for composite null hypotheses
- GET.composite, see a detailed example in saplings
Also the adjusted tests can be based on several test functions.
One-way functional ANOVA:
- Graphical functional ANOVA tests: graph.fanova
- Global rank envelope based on F-values: frank.fanova
- Image (2d function) counterparts: graph.fanova2d, frank.fanova2d
Functional general linear model (GLM):
- Graphical functional GLM: graph.fglm
- Global rank envelope based on F-values: frank.fglm
- Image (2d function) counterparts: graph.fglm2d, frank.fglm2d
Wrapper functions to perform global envelopes for specific purposes:
- Central regions for images (2d functions): central_region2d
- Global envelope tests for images (2d functions): global_envelope_test2d
- Graphical n sample test of correspondence of distribution functions: GET.necdf
- Variogram and residual variogram with global envelopes: GET.variogram
Deviation tests (for simple hypothesis): deviation_test (no gpaphical interpretation)
Most functions accept the curves provided in a curve_set (or image_set) object. Use create_curve_set (or create_image_set) to create a curve_set (or image_set) object from the functions T_i(r), i=1,...,s+1. Other formats to provide the curves to the above functions are also accepted, see the information on the help pages.

See the help files of the functions for examples.

Workflow for (single hypothesis) tests based on single functions

To perform a test you always first need to obtain the test function T(r) for your data (T_1(r)) and for each simulation (T_2(r), ..., T_nsim+1(r)) in one way or another. Given the set of the functions T_i(r), i=1,...,nsim+1, you can perform a test by global_envelope_test.

1) The workflow when using your own programs for simulations:

(Fit the model and) Create nsim simulations from the (fitted) null model.
Calculate the functions T_1(r), T_2(r), ..., T_nsim+1(r).
Use create_curve_set to create a curve_set object from the functions T_i(r), i=1,...,s+1.
Perform the test and plot the result

res <- global_envelope_test(curve_set) # curve_set is the 'curve_set'-object you created

plot(res)

2) The workflow utilizing spatstat:

E.g. Say we have a point pattern, for which we would like to test a hypothesis, as a ppp object.

X <- spruces # an example pattern from spatstat

Test complete spatial randomness (CSR):
- Use envelope to create nsim simulations under CSR and to calculate the functions you want. Important: use the option 'savefuns=TRUE' and specify the number of simulations nsim. See the help documentation in spatstat for possible test functions (if fun not given, Kest is used, i.e. an estimator of the K function).
  
  Making 999 simulations of CSR and estimating K-function for each of them and data (the argument simulate specifies for envelope how to perform simulations under CSR):
  
  env <- envelope(X, nsim=999, savefuns=TRUE, simulate=expression(runifpoint(ex=X)))
- Perform the test
  
  res <- global_envelope_test(env)
- Plot the result
  
  plot(res)
A goodness-of-fit of a parametric model (composite hypothesis case)
- Fit the model to your data by means of the function ppm or kppm. See the help documentation for possible models.
- Use GET.composite to create nsim simulations from the fitted model, to calculate the functions you want, and to make an adjusted global envelope test. See the example also in saplings.
- Plot the result
  
  plot(res)

Functions for modifying sets of functions

It is possible to modify the curve set T_1(r), T_2(r), ..., T_nsim+1(r) for the test.

You can choose the interval of distances [r_min, r_max] by crop_curves.
For better visualisation, you can take T(r)-T_0(r) by residual. Here T_0(r) is the expectation of T(r) under the null hypothesis.

The function envelope_to_curve_set can be used to create a curve_set object from the object returned by envelope. An envelope object can also directly be given to the functions mentioned above in this section.

Example data (see references on the help pages of each data set)

adult_trees: a point pattern of adult rees
cgec: centred government expenditure centralization (GEC) ratios (see graph.fanova)
fallen_trees: a point pattern of fallen trees
GDPtax: GDP per capita with country groups and other covariates
imageset1: a simulated set of images (see graph.fanova2d, frank.fanova2d)
imageset2: a simulated set of images (see graph.fglm2d, frank.fglm2d)
imageset3: a simulated set of images
rimov: water termperature curves in 365 days of the 36 years
saplings: a point pattern of saplings (see GET.composite)

The data sets are used to show examples of the functions of the library.

Number of simulations

Note that the recommended minimum number of simulations for the rank envelope test based on a single function is nsim=2499, while for the "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests, a lower number of simulations can be used, although the Monte Carlo error is obviously larger with a small number of simulations. For increasing number of simulations, all the global rank envelopes approach the same curves.

Mrkvi<U+010D>ka et al. (2017) discussed the number of simulations for tests based on many functions.

Acknowledgements

Pavel Grabarnik, Ute Hahn, Mikko Kuronen, Michael Rost and Henri Seijo have made contributions and suggestions of code.

Details

The GET package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).

In the special case of spatial processes (spatial point processes, random sets), the functions are typically estimators of summary functions. The package supports the use of the R library spatstat for generating simulations and calculating estimators of the chosen summary function, but alternatively these can be done by any other way, thus allowing for any models/functions.

References

Myllym<U+00E4>ki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004

Myllym<U+00E4>ki, M., Mrkvi<U+010D>ka, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381<U+2013>404. doi: 10.1111/rssb.12172

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9

Mrkvi<U+010D>ka, T., Soubeyrand, S., Myllym<U+00E4>ki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A: 40-53. doi: http://dx.doi.org/10.1016/j.spasta.2016.04.005

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M., Jilek, M. and Hahn, U. (2018) A one-way ANOVA test for functional data with graphical interpretation. arXiv:1612.03608 [stat.ME] (http://arxiv.org/abs/1612.03608)

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M. and Narisetty, N. N. (2019) New methods for multiple testing in permutation inference for the general linear model. arXiv:1906.09004 [stat.ME]

Mrkvi<U+010D>ka, T., Roskovec, T. and Rost, M. (2019) A nonparametric graphical tests of significance in functional GLM. arXiv:1902.04926 [stat.ME]