gofPIOSTn: 2 and 3 dimensional gof test based on the in-and-out-of-sample approach

Description

gofPIOSTn tests a 2 or 3 dimensional dataset with the PIOS test for a copula. The possible copulae are "normal", "t", "clayton", "gumbel", "frank", "joe", "amh", "galambos", "fgm" and "plackett". The parameter estimation is performed with pseudo maximum likelihood method. In case the estimation fails, inversion of Kendall's tau is used. The approximate p-values are computed with a semiparametric bootstrap, which computation can be accelerated by enabling in-build parallel computation.

Usage

gofPIOSTn(
  copula = c("normal", "t", "clayton", "gumbel", "frank", "joe", "amh", "galambos",
    "fgm", "plackett"),
  x,
  param = 0.5,
  param.est = TRUE,
  df = 4,
  df.est = TRUE,
  margins = "ranks",
  flip = 0,
  M = 1000,
  dispstr = "ex",
  m = 1,
  lower = NULL,
  upper = NULL,
  seed.active = NULL,
  processes = 1
)

Value

An object of the class gofCOP with the components

method: a character which informs about the performed analysis
copula: the copula tested for
margins: the method used to estimate the margin distribution.
param.margins: the parameters of the estimated margin distributions. Only applicable if the margins were not specified as "ranks" or NULL.
theta: dependence parameters of the copulae
df: the degrees of freedem of the copula. Only applicable for t-copula.
res.tests: a matrix with the p-values and test statistics of the hybrid and the individual tests

Arguments

copula: The copula to test for. Possible are "normal", "t", "clayton", "gumbel", "frank", "joe", "amh", "galambos", "fgm" and "plackett".
x: A matrix containing the data with rows being observations and columns being variables.
param: The parameter to be used.
param.est: Shall be either TRUE or FALSE. TRUE means that param will be estimated with a maximum likelihood estimation.
df: Degrees of freedom, if not meant to be estimated. Only necessary if tested for "t"-copula. For computational reasons the entry is limited to 60 degrees of freedom.
df.est: Indicates if df shall be estimated. Has to be either FALSE or TRUE, where TRUE means that it will be estimated. For computational reasons the estimate is limited to 60 degrees of freedom.
margins: Specifies which estimation method for the margins shall be used. The default is "ranks", which is the standard approach to convert data in such a case. Alternatively the following distributions can be specified: "beta", "cauchy", Chi-squared ("chisq"), "f", "gamma", Log normal ("lnorm"), Normal ("norm"), "t", "weibull", Exponential ("exp"). Input can be either one method, e.g. "ranks", which will be used for estimation of all data sequences. Also an individual method for each margin can be specified, e.g. c("ranks", "norm", "t") for 3 data sequences. If one does not want to estimate the margins, set it to NULL.
flip: The control parameter to flip the copula by 90, 180, 270 degrees clockwise. Only applicable for bivariate copula. Default is 0 and possible inputs are 0, 90, 180, 270 and NULL.
M: Number of bootstrapping loops.
dispstr: A character string specifying the type of the symmetric positive definite matrix characterizing the elliptical copula. Implemented structures are "ex" for exchangeable and "un" for unstructured, see package copula.
m: Length of blocks.
lower: Lower bound for the maximum likelihood estimation of the copula parameter. The constraint is also active in the bootstrapping procedure. The constraint is not active when a switch to inversion of Kendall's tau is necessary. Default NULL.
upper: Upper bound for the maximum likelihood estimation of the copula parameter. The constraint is also active in the bootstrapping procedure. The constraint is not active when a switch to inversion of Kendall's tau is necessary. Default NULL.
seed.active: Has to be either an integer or a vector of M+1 integers. If an integer, then the seeds for the bootstrapping procedure will be simulated. If M+1 seeds are provided, then these seeds are used in the bootstrapping procedure. Defaults to NULL, then R generates the seeds from the computer runtime. Controlling the seeds is useful for reproducibility of a simulation study to compare the power of the tests or for reproducibility of an empirical study.
processes: The number of parallel processes which are performed to speed up the bootstrapping. Shouldn't be higher than the number of logical processors. Please see the details.

Details

The "Tn" test is introduced in Zhang et al. (2015). It tests the $H_0$ hypothesis $$H_0 : C_0 \in \mathcal{C}.$$ For the test blocks of length m are constructed out of the data. The test compares then the pseudo likelihood of the data in each block with the overall parameter and with the parameter by leaving out the data in the block. By this procedure can be determined if the data in the block influence the parameter estimation significantly. The test statistic is defined as $$T = \sum_{b=1}^M \sum_{k=1}^m [l\{U_k^b;\theta_n \} - l\{U_k^b;\theta_n^{-b} \}]$$

with the pseudo observations $U_{ij}$ for $i = 1, \dots,n$; $j = 1, \dots,d$ and $$\theta_n = \arg \min_{\theta} \sum_{i=1}^n l(U_i; \theta)$$ and $$\theta_n^{-b} = \arg \min_{\theta} \sum_{b^{'} \neq b}^M \sum_{i=1}^m l(U_i^{b^{'}}; \theta), b=1, \dots, M.$$

The approximate p-value is computed by the formula $$\sum_{b=1}^M \mathbf{I}(|T_b| \geq |T|) / M,$$

The applied estimation method is the two-step pseudo maximum likelihood approach, see Genest and Rivest (1995).

For small values of M, initializing the parallelisation via processes does not make sense. The registration of the parallel processes increases the computation time. Please consider to enable parallelisation just for high values of M.

References

Zhang, S., Okhrin, O., Zhou, Q., and Song, P.. Goodness-of-fit Test For Specification of Semiparametric Copula Dependence Models. Journal of Econometrics, 193, 2016, pp. 215-233 tools:::Rd_expr_doi("10.1016/j.jeconom.2016.02.017")

Genest, C., K. G. and Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82:534-552

Examples

Run this code


data(IndexReturns2D)

gofPIOSTn("normal", IndexReturns2D, M = 10)

Run the code above in your browser using DataLab