This function performs a bootstrap goodness-of-fit hypothesis test for a
specific univariate parametric family. The null hypothesis corresponds to the
sample coming from the specified parametric family, while the alternative
hypothesis corresponds to the sample not coming from the specified
parametric family. This function implements a parametric bootstrap and
a non-parametric bootstrap. The test statistic is the Kolmogorov-Smirnov test
statistic. To estimate the parameters of the parametric family, either a minimum
distance estimator, or a MLE estimator (the sample mean and variance)
is used. On the bootstrap sample, we have also implemented a centered MD estimator,
as in the paper. For now, only a test of normality is implemented. This function
gives the corresponding p-values, the true test statistic and the
bootstrap-version test statistics. The default (and valid) method implemented
in this function is the parametric bootstrap, together with the equivalent test statistic
and the MLE parameter estimator. Via the bootstrapOptions
argument, the user can specify other bootstrap resampling schemes,
test statistics, and parameter estimators.
perform_GoF_test(
X_data,
parametric_fam = "normal",
nBootstrap = 100,
mygrid = NULL,
show_progress = TRUE,
bootstrapOptions = NULL,
verbose = 0
)A class object with components
pvals_df a dataframe of p-values and bootstrapped test statistics:
These are the p-values for the combinations of bootstrap resampling schemes, test statistics (centered and equivalent), and different parameter estimators.
It also contains the vectors of bootstrap test statistics for each of these combinations.
true_stat a named vector of size 2 containing the true test
statistics. The first entry is the Kolmogorov-Smirnov test statistic for
the Minimum Distance estimator, and the second entry is the Kolmogorov-Smirnov
test statistic for the MLE parameter estimator.
nBootstrap number of bootstrap repetitions.
nameMethod string for the name of the method used.
numerical input vector. Perform a GoF test whether or not this
sample comes from "parametric_fam", a specified parametric distribution.
name of the parametric family. For the moment, only
"normal" is supported.
numeric value of the number of bootstrap resamples. Defaults to 100.
description of the grid used to compute the CDFs on. This must be one of
NULL: a regularly spaced grid from the minimum value to the
maximum value with 100 points is used. This is the default.
A numeric of size 1. This is used at the length of the grid, replacing
100 in the above explanation.
A numeric vector of size larger than 1. This is directly used as the grid.
logical value indicating whether to show a progress bar
This can be one of
NULL. This uses the default options type_boot = "param",
type_stat = "eq" and type_estimator_bootstrap = "MLE".
a list with at most 3 elements named:
type_boot type of bootstrap resampling scheme. It must be
one of
"param" for the parametric bootstrap (i.e. under the null).
This is the default.
"NP" for the non-parametric bootstrap
(i.e. n out of n bootstrap).
type_stat type of test statistic to be used. It must be
one of
"eq" for the equivalent test statistic
\(T_n^* = \sqrt{n} || \hat{F}^* - F_{\hat\theta^*} ||\)
"cent" for the centered test statistic
\(T_n^* = \sqrt{n} || \hat{F}^* - \hat{F} + F_{\hat\theta} - F_{\hat\theta^*} ||\)
For each type_boot there is only one valid choice of type_stat
to be made. If type_stat is not specified, the valid choice is
automatically used.
type_estimator_bootstrap: the bootstrap parameter
estimator to be used. It must be one of:
"MLE" for the MLE estimator
(for the normal distribution, this corresponds to the usual
empirical mean and variance).
This is always a valid choice in the case that the combination
(type_boot, type_stat) is valid (as defined above).
Therefore, this is the default option. It is also the fastest type
of estimator.
"MD-eq" for the Minimum Distance estimator.
This is a valid choice if and only if type_stat = "eq". It
is necessary in this case to use an equivalent bootstrap
estimator to match the equivalent bootstrap test statistic. This
bootstrap parameter estimator is given as:
\(\theta_n^{*,MD}=\arg\min_{\theta} || \hat{F}^* - F_{\theta} ||\)
"MD-cent" for the centered Minimum Distance estimator.
This is a valid choice if and only if type_stat = "cent". It
is necessary in this case to perform a centering on the bootstrap
estimator to match the centered bootstrap test statistic. This
bootstrap parameter estimator is given as:
\(\theta_n^{*,MD, cent}=\arg\min_{\theta}
|| \hat{F}^* - F_{\theta}- \hat{F} + F_{\hat\theta} ||\)
"all" this gives test results for all theoretically valid
combinations of bootstrap resampling schemes.
"all and also invalid" this gives test results for all possible
combinations of bootstrap resampling schemes and test statistics, including
invalid ones.
A warning is raised if the given combination of type_boot,
type_stat, and type_estimator_bootstrap is theoretically invalid.
If verbose = 0, this function is silent and does not
print anything. Increasing values of verbose print more details about
the progress of the computations.
Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, tools:::Rd_expr_doi("10.48550/arXiv.2512.10546")
perform_regression_test,perform_independence_test.
The print and plot methods, such as plot.bootstrapTest.
n <- 100
# Under H1
X_data <- rgamma(n,2,3)
result <- perform_GoF_test(X_data,
nBootstrap = 100,
bootstrapOptions = list(type_boot = "param",
type_stat = "eq",
type_estimator_bootstrap = "MLE")
)
print(result)
plot(result)
# Under H0
X_data <- rnorm(n)
result <- perform_GoF_test(X_data, nBootstrap = 100)
print(result)
plot(result)
Run the code above in your browser using DataLab