global_envelope_test: Global envelope test

Description

Global envelope test, global envelopes and p-values

Usage

global_envelope_test(
  curve_sets,
  type = "erl",
  alpha = 0.05,
  alternative = c("two.sided", "less", "greater"),
  ties = "erl",
  probs = c(0.025, 0.975),
  quantile.type = 7,
  central = "mean",
  nstep = 2,
  ...
)

Arguments

curve_sets

A curve_set (see create_curve_set) or an envelope object containing a data function and simulated functions. If an envelope object is given, it must contain the summary functions from the simulated patterns which can be achieved by setting savefuns = TRUE when calling envelope. Alternatively, a list of curve_set or envelope objects can be given.

type

The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details.

alpha

The significance level. The 100(1-alpha)% global envelope will be calculated.

alternative

A character string specifying the alternative hypothesis. Must be one of the following: "two.sided" (default), "less" or "greater". The last two options only available for types 'rank', 'erl', 'cont' and 'area'.

ties

The method to obtain a unique p-value when type = 'rank'. Possible values are 'midrank', 'random', 'conservative', 'liberal' and 'erl'. For 'conservative' the resulting p-value will be the highest possible. For 'liberal' the p-value will be the lowest possible. For 'random' the rank of the obs within the tied values is uniformly sampled so that the resulting p-value is at most the conservative option and at least the liberal option. For 'midrank' the mid-rank within the tied values is taken. For 'erl' the extreme rank length p-value is calculated. The default is 'erl'.

probs

A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllym<U+00E4>ki et al. (2015, 2017).

quantile.type

As type argument of quantile, how to calculate quantiles for 'q' or 'qdir'.

central

Either "mean" or "median". If the curve sets do not contain the component theo for the theoretical central function, then the central function (used for plotting only) is calculated either as the mean or median of functions provided in the curve sets.

nstep

1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step.

...

Additional parameters to be passed to central_region.

Value

Either an object of class "global_envelope" or "combined_global_envelope" (the latter for combined tests). The objects can be printed and plotted directly.

The "global_envelope" is essentially a data frame containing columns

the values of the argument r at which the test was made, copied from the argument curve_sets with the corresponding names
obs = values of the data function, copied from the argument curve_sets; missing if more than one observed function
lo = the lower envelope
hi = the upper envelope
central = a central curve as specified in the argument central.

Moreover, the return value has the same attributes as the object returned by central_region and in addition

p = A point estimate for the p-value (default is the mid-rank p-value).

and in the case that type = 'rank' also

p_interval = The p-value interval [p_liberal, p_conservative].
ties = As the argument ties.

The "combined_global_envelope" is a list of "global_envelope" objects corresponding to the components of curve_sets. The second level envelope on which the envelope construction is based on is saved in the attribute "level2_ge".

Ranking of the curves

The options for measures to order the functions from the most extreme one to the least extreme one are given by the argument type: 'rank', 'erl', 'cont', 'area', 'qdir', 'st', 'unscaled'. The options are

'rank': extreme ranks (Myllym<U+00E4>ki et al., 2017)
'erl': extreme rank lengths (Myllym<U+00E4>ki et al., 2017; Mrkvi<U+010D>ka et al., 2018)
'cont': continuous ranks (Hahn, 2015; Mrkvi<U+010D>ka et al., 2019)
'area': area ranks (Mrkvi<U+010D>ka et al., 2019)
'qdir': the directional quantile maximum absolute deviation (MAD) measure (Myllym<U+00E4>ki et al., 2015, 2017)
'st': the studentized MAD measure (Myllym<U+00E4>ki et al., 2015, 2017)
'unscaled': the unscaled MAD measure (Ripley, 1981)

See more detailed description of the envelopes and measures in Myllym<U+00E4>ki and Mrkvi<U+010D>ka (2020, Section 2.).

Global envelope

Based on the measures used to rank the functions, the 100(1-alpha)% global envelope is provided. It corresponds to the 100*coverage% central region.

P-values

In the case type="rank", based on the extreme ranks k_i, i=1, ..., s+1, the p-interval is calculated. Because the extreme ranks contain ties, there is not just one p-value. The p-interval is given by the most liberal and the most conservative p-value estimate. Also a single p-value is calculated. By default this single p-value is the extreme rank length p-value ("erl"), but another option can be used by specifying ties argument.

If the case type = "erl", the (single) p-value based on the extreme rank length ordering of the functions is calculated and returned in the attribute p. The same is done for other measures, the p-value always being correspondent to the chosen measure.

Number of simulations

For the global "rank" envelope test, Myllym<U+00E4>ki et al. (2017) recommended to use at least 2500 simulations for testing at the significance level alpha = 0.05 for single function tests, based on experiments with summary functions for point processes evaluated approximately at 500 argument values. In this case, the width of the p-interval associated with the extreme rank measure tended to be smaller than 0.01. The tests 'erl', 'cont' and 'area', similarly as the MAD deviation/envelope tests 'qdir', 'st' and 'unscaled', allow in principle a lower number of simulations to be used than the test based on extreme ranks ('rank'), because no ties occur for these measures. If affordable, we recommend in any case some thousands of simulations for all the measures to achieve a good power and repeatability of the test.

Tests based on several functions

If a list of (suitable) objects are provided in the argument curve_sets, then by default (nstep = 2) the two-step combining procedure is used to perform the combined global test as described in Myllym<U+00E4>ki and Mrkvi<U+010D>ka (2019). If nstep = 1 and the lengths of the multivariate vectors in each component of the list are equal, then the one-step combining procedure is used where the functions are concatenated together into a one long vector.

Details

Given a curve_set (see create_curve_set for how to create such an object) or an envelope object, which contains both the data curve (or function or vector) \(T_1(r)\) (in the component obs) and the simulated curves \(T_2(r),\dots,T_{s+1}(r)\) (in the component sim_m), the function global_envelope_test performs a global envelope test. The functionality of the function is rather similar to the function central_region, but in addition to ordering the functions from the most extreme one to the least extreme one using different measures and providing the global envelopes with intrinsic graphical interpretation, p-values are calculated for the test. Thus, while central_region can be used to construct global envelopes in a general setting, the function global_envelope_test is devoted to testing as its name suggests.

The function global_envelope_test is the main function for global envelope tests (for simple hypotheses). Different type of global envelope tests can be performed. We use such ordering of the functions for which we are able to construct global envelopes with intrinsic graphical interpretation.

'rank': the completely non-parametric rank envelope test (Myllym<U+00E4>ki et al., 2017) based on minimum of pointwise ranks
'erl': the completely non-parametric rank envelope test based on extreme rank lengths (Myllym<U+00E4>ki et al., 2017; Mrkvi<U+010D>ka et al., 2018) based on number of minimal pointwise ranks
'cont': the completely non-parametric rank envelope test based on continuous rank (Hahn, 2015; Mrkvi<U+010D>ka et al., 2019) based on minimum of continuous pointwise ranks
'area': the completely non-parametric rank envelope test based on area rank (Mrkvi<U+010D>ka et al., 2019) based on area between continuous pointwise ranks and minimum pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum (it is a combination of erl and cont)
"qdir", the directional quantile envelope test, protected against unequal variance and asymmetry of T(r) for different distances r (Myllym<U+00E4>ki et al., 2015, 2017)
"st", the studentised envelope test, protected against unequal variance of T(r) for different distances r (Myllym<U+00E4>ki et al., 2015, 2017)
"unscaled", the unscaled envelope (providing a baseline) that has a contant width and that corresponds to the classical maximum deviation test (Ripley, 1981).

See Myllym<U+00E4>ki and Mrkvi<U+010D>ka (2020, Section 2.) for more detailed description of the measures and the corresponding envelopes.

The first four types are global rank envelopes. The 'rank' envelope test is a completely non-parametric test, which provides the 100(1-alpha) T(r) on the chosen interval of distances and associated p-values. The other three are modifications of 'rank' to treat the ties in the extreme rank ordering on which the 'rank' test is based on.

The last three envelopes are global scaled maximum absolute difference (MAD) envelope tests. The unscaled envelope test leads to envelopes with constant width over the distances r. Thus, it suffers from unequal variance of T(r) over the distances r and from the asymmetry of distribution of T(r). We recommend to use the other global envelope tests available. The unscaled envelope is provided as a reference.

References

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56 (3), 432-458. doi: 10.14736/kyb-2020-3-0432

Mrkvi<U+010D>ka, T., Myllym<U+00E4>ki, M. and Narisetty, N. N. (2019) New methods for multiple testing in permutation inference for the general linear model. arXiv:1906.09004 [stat.ME]

Myllym<U+00E4>ki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004

Myllym<U+00E4>ki, M., Mrkvi<U+010D>ka, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381<U+2013>404. doi: 10.1111/rssb.12172

Myllym<U+00E4>ki, M. and Mrkvi<U+010D>ka, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]

Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.

Examples

Run this code

# NOT RUN {
# Goodness-of-fit testing for simple hypothesis
if(require("spatstat", quietly=TRUE)) {
  # Testing complete spatial randomness (CSR)
  #==========================================
  X <- unmark(spruces)

  
# }
# NOT RUN {
nsim <- 1999 # Number of simulations
# }
# NOT RUN {
  
# }
# NOT RUN {
  # Illustration of general workflow for simple hypotheses
  #=======================================================
  # First illustrate the general workflow for the test by this example
  # of CSR test for a point pattern X using the empirical L-function.
  # Define the argument values at which the functions are evaluated
  obs.L <- Lest(X, correction = "translate")
  r <- obs.L[['r']]
  # The test function for the data
  obs <- obs.L[['trans']] - r
  # Prepare simulations and calculate test functions for them at same r as 'obs'
  sim <- matrix(nrow = length(r), ncol = nsim)
  for(i in 1:nsim) {
    sim.X <- runifpoint(ex = X) # simulation under CSR
    sim[, i] <- Lest(sim.X, correction = "translate", r = r)[['trans']] - r
  }
  # Create a curve_set containing argument values, observed and simulated functions
  cset <- create_curve_set(list(r = r, obs = obs, sim_m = sim))
  # Perform the test
  res <- global_envelope_test(cset, type="erl")
  plot(res, ylab = expression(italic(hat(L)(r)-r)))

  # Simple hypothesis for a point pattern utilizing the spatstat package
  #=====================================================================
  # Generate nsim simulations under CSR, calculate L-function for the data and simulations
  env <- envelope(X, fun="Lest", nsim=nsim,
                  savefuns=TRUE, # save the functions
                  correction="translate", # edge correction for L
                  transform = expression(.-r), # centering
                  simulate=expression(runifpoint(ex=X))) # Simulate CSR
  # The rank envelope test (ERL)
  res <- global_envelope_test(env, type="erl")
  # Plot the result
  plot(res)

  ## Advanced use:
  # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env')
  cset <- crop_curves(env, r_min=1, r_max=7)
  # Do the rank envelope test (erl)
  res <- global_envelope_test(cset, type="erl")
  plot(res, ylab=expression(italic(L(r)-r)))

  
# }
# NOT RUN {
  # Random labeling test
  #=====================
  mpp <- spruces
  # 1) Perform simulations under the random labelling hypothesis and calculate
  # the test function T(r) for the data pattern (mpp) and each simulation.
  # The command below specifies that the test function is T(r) = \hat{L}_mm(r),
  # which is an estimator of the mark-weighted L function, L_mm(r),
  # with translational edge correction.
  nsim <- 1999 # Number of simulations
  env <- envelope(mpp, fun=Kmark, nsim = nsim, f=function(m1, m2) { m1*m2 },
                  correction="translate", returnL=TRUE,
                  simulate=expression(rlabel(mpp, permute=TRUE)), # Permute the marks
                  savefuns=TRUE) # Save the functions
  # 2)
  # Crop curves to desired r-interval
  curve_set <- crop_curves(env, r_min=1.5, r_max=9.5)
  # Center the functions, i.e. take \hat{L}_mm(r)-T_0(r).
  # Below T_0(r) = \hat{L}(r) is the mean of simulated functions.
  # (This is only for visualization, does not affect the test result.)
  curve_set <- residual(curve_set)
  # 3) Do the rank envelope test
  res <- global_envelope_test(curve_set)
  # 4) Plot the test result
  plot(res, ylab=expression(italic(L[mm](r)-L(r))))

  # A combined global envelope test
  #================================
  # As an example test CSR of the saplings point pattern by means of
  # L, F, G and J functions.
  data(saplings)
  X <- saplings

  nsim <- 499 # Number of simulations
  nsim <- 19 # Number of simulations
  # Specify distances for different test functions
  n <- 500 # the number of r-values
  rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n
  rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n
  r <- seq(0, rmax, by=rstep)    # r-distances for Lest
  rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest

  # Perform simulations of CSR and calculate the L-functions
  env_L <- envelope(X, nsim=nsim,
   simulate=expression(runifpoint(ex=X)),
   fun="Lest", correction="translate",
   transform=expression(.-r), # Take the L(r)-r function instead of L(r)
   r=r,                       # Specify the distance vector
   savefuns=TRUE,             # Save the estimated functions
   savepatterns=TRUE)         # Save the simulated patterns
  # Take the simulations from the returned object
  simulations <- attr(env_L, "simpatterns")
  # Then calculate the other test functions F, G, J for each simulated pattern
  env_F <- envelope(X, nsim=nsim,
                    simulate=simulations,
                    fun="Fest", correction="Kaplan", r=rJ,
                    savefuns=TRUE)
  env_G <- envelope(X, nsim=nsim,
                    simulate=simulations,
                    fun="Gest", correction="km", r=rJ,
                    savefuns=TRUE)
  env_J <- envelope(X, nsim=nsim,
                    simulate=simulations,
                    fun="Jest", correction="none", r=rJ,
                    savefuns=TRUE)

  # Crop the curves to the desired r-interval I
  curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax)
  curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ)
  curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ)
  curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ)

  res <- global_envelope_test(curve_sets=list(curve_set_L, curve_set_F,
                                              curve_set_G, curve_set_J))
  plot(res, labels=c("L(r)-r", "F(r)", "G(r)", "J(r)"))
  
# }
# NOT RUN {
}

# A test based on a low dimensional random vector
#================================================
# Let us generate some example data.
X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2)
if(requireNamespace("mvtnorm", quietly=TRUE)) {
  Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations
  plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2])))
  points(X, col=2)

  # Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier).

  # Case 1. The test vector is (X_1, X_2)
  cset1 <- create_curve_set(list(r=1:2, obs=as.vector(X), sim_m=t(Y)))
  res1 <- global_envelope_test(cset1)
  plot(res1)

  # Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))).
  t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) }
  cset2 <- create_curve_set(list(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim_m=rbind(t(Y), t3(Y,Y))))
  res2 <- global_envelope_test(cset2)
  plot(res2)
}
# }

Run the code above in your browser using DataLab