CDplot: CD-plot and Deviance test

Description

Constructs the CD-plot and computes the deviance test for exhaustive goodness-of-fit.

Usage

CDplot(data,m=4,g,par0=NULL,range=NULL,lattice=NULL,selection=TRUE,criterion="BIC",
        B=1000,samplerG=NULL,h=NULL,samplerH=NULL,R=500,ylim=c(0,2),CD.plot=TRUE)

Arguments

data

A data vector. See details.

If selection = FALSE, it corresponds to the desired size of the polynomial basis to be used. If selection = TRUE, it is the size of the polynomial basis from which the terms to include in the model are selected.

Function corresponding to the parametric start. See details.

par0

A vector of starting values for the parameters of g when the latter is not fully known. See details.

range

Interval corresponding to the support of the continuous data distribution.

lattice

Support of the discrete data distribution.

selection

A logical argument indicating if model selection should be performed. See details.

criterion

If selection=TRUE, the selection criterion to be. The two possibilities are "AIC" or "BIC". See details.

A positive integer corresponding to the number of bootstrap replicates.

samplerG

A function corresponding to the random sampler for the parametric start g. See details.

Instrumental probability function. If samplerG is not NULL, the argument h will not be used.

samplerH

A function corresponding to the random sampler for the instrumental probability function h. If samplerG is not NULL, the argument samplerH will not be used.

A positive integer corresponding to the size of the grid of equidistant points at which the comparison densities are evaluated. The default is R = 500, a larger value may be needed when the smoothness of the comparison densities decrease.

ylim

If check.plot=TRUE, the range of the y-axis of the respective comparison density plot. The default is c(0, 2).

CD.plot

A logical argument indicating if the comparison density plot should be displayed or not. The default is TRUE.

Value

Deviance

Value of the deviance test statistic.

p_value

P-value of the deviance test.

Details

The argument data collects the data for which we want to test if its distribution corresponds to the one of the postulated model specified in the argument g. If the parametric start is fully known, it must be specified in a way that it takes x as the only argument. If the parametric start is not fully known, it must be specified in a way that it takes arguments x and par, with par corresponding to the vector of unknown parameters. The latter are estimated numerically via maximum likelihood estimation and par0 specifies the initial values of the parameters to be used in the optimization. The value m determines the smoothness of the estimated comparison density, with smaller values of m leading to smoother estimates. If selection=TRUE, the largest coefficient estimates are selected according to either the AIC or BIC criterion as described in Algeri and Zhang, 2020 (see also Ledwina, 1994 and Mukhopadhyay, 2017). The resulting estimator is the one in Gajek's formulation with orthonormal basis corresponding to LP score functions (see Algeri and Zhang, 2020 and Gajek, 1986).

References

Algeri S. and Zhang X. (2020). Exhaustive goodness-of-fit via smoothed inference and graphics. arXiv:2005.13011.

Gajek, L. (1986). On improving density estimators which are not bona fide functions. The Annals of sStatistics, 14(4):1612--1618.

Ledwina, T. (1994). Data-driven version of neymany's smooth test of fit. Journal of the American Statistical Association, 89(427):1000--1005.

Mukhopadhyay, S. (2017). Large-scale mode identification and data-driven sciences. Electronic Journal of Statistics 11 (2017), no. 1, 215--240.

Examples

Run this code

# NOT RUN {
data<-rbinom(50,size=20,prob=0.5)
g<-function(x)dpois(x,10)/(ppois(20,10)-ppois(0,10))
samplerG<-function(n){xx<-rpois(n*3,10)
                      xxx<-sample(xx[xx<=20],n)
                      return(xxx)}
CDplot(data,m=4,g,par0=NULL,range=NULL,lattice=seq(0,20),
       selection=FALSE,criterion="BIC",B=10,samplerG,R=300,ylim=c(0,2))
# }

Run the code above in your browser using DataLab