vs.test: Vasicek-Song goodness-of-fit test for various distributions

Description

Performs Vasicek-Song goodness-of-fit test to the specified distribution family.

Usage

vs.test(x, densfun, param = NULL, 
        simulate.p.value = NULL, B = 5000,
        delta = NULL, extend = FALSE, relax = FALSE)

Arguments

(numeric, vector) the numeric sample.

densfun

A character string specifying the fitted distribution. Possible values are "dunif", "dnorm", "dlnorm", "dexp", "dgamma", "dweibull", "dpareto", "df", "dlaplace" and "dbeta".

param

(numeric, vector) specifies the parameter(s) of the fitted distribution. If NULL (default), a GOF test to the parametric family of distributions specified by densfun is performed.

simulate.p.value

(logical, single value) if TRUE, the p-value of the sample is estimated by means of Monte Carlo methods. If NULL (the default), the p-value is simulated if the sample size is smaller than 80; otherwise, an asymptotic p-value is computed.

(numeric, single value) a numeric value specifying the number of simulations to perform in Monte-Carlo estimation of the p-value.

delta

(numeric, single value) a numeric value smaller than $1/3$ specifying the upper bound $n^{1/3}-\delta$ for window size, where $n$ is the sample size. The default depends on densfun; see Vignettes for details.

extend

(logical, single value). If FALSE (the default), the bound for the window is $n^{1/3}-\delta$; if TRUE, the bound is $n/2$.

relax

(logical, single value) avoids the constraint $V_{mn} \leq -\frac{1}{n} \sum_{i=1}^n \log p_0(X_i, \widehat{\theta}_n)$ when computing the optimal window; see details. Default is FALSE.

Value

A list with class "htest" containing the following components:

observed

The sample under study.

data.name

The name (as an R object) of the sample.

null.value

A character string specifying the name of the fitted distribution.

method

The character string "Vasicek GOF test to" followed by the name of the fitted distribution.

statistic

Vasicek test statistic; see Details below.

parameter

The optimal window for Vasicek test statistic

estimate

Parameter(s) of the fitted distribution. If param is NULL, parameters are estimated. If param is suitably filled out by the user, it is returned.

p.value

The p-value of the test.

Details

The test statistic is $$I_{mn}=-V_{mn}-\frac{1}{n}\sum_{i=1}^{n}\log p_{0}(X_{i},\theta),$$ where $V_{mn}$ is the Vasicek estimator of Shannon entropy computed from the numeric sample x with window size $m$ and $p_{0}(x,\theta)$ is the density function of the specified distribution densfun to be tested, with $\theta$ the parameter of the null for a simple hypothesis or its maximum likelihood estimate for a composite null hypothesis (param=NULL); See Song (2002), Girardin and Lequesne (2017) and Lequesne and Regnault (2018).

An optimal window size $m$ is automatically computed; see Song (2002).

An exact p-value is computed if the sample size is less than 100. Otherwise, asymptotic distribution is used whose approximation may be inaccurate for small samples; see Lequesne and Regnault (2018).

References

Vasicek, O., A test for normality based on sample entropy, Journal of the Royal Statistical Society, 38(1), 54-59 (1976).

Song, K. S., Goodness-of-fit tests based on Kullback-Leibler discrimination information, Information Theory, IEEE Transactions on, 48(5), 1103-1117 (2002).

Girardin, V., Lequesne, J. Entropy-based goodness-of-fit tests - a unifying framework. Application to DNA replication. Communications in Statistics: Theory and Methods (2017). https://doi.org/10.1080/03610926.2017.1401084

Lequesne, J., Regnault, P. vsgoftest: An R Package for Goodness-of-Fit Testing Based on Kullback-Leibler Divergence. Journal of Statistical Software, 96 (2020). doi:10.18637/jss.v096.c01

Examples

Run this code

# NOT RUN {
set.seed(1)
samp <- rnorm(50,2,3)
vs.test(x = samp, densfun = 'dnorm', param = c(2,3), B = 500) #Simple null hypothesis
vs.test(x = samp, densfun='dnorm', B = 500) #Composite null hypothesis
## Using asymptotic distribution to compute the p-value
vs.test(x = samp, densfun='dnorm', simulate.p.value = FALSE) #Composite null hypothesis

# }

Run the code above in your browser using DataLab