npcmstest: Kernel Consistent Model Specification Test with Mixed Data Types

Description

npcmstest implements a consistent test for correct specification of parametric regression models (linear or nonlinear) as described in Hsiao, Li, and Racine (2007).

Usage

npcmstest(formula,
          data = NULL,
          subset,
          xdat,
          ydat,
          model = stop(paste(sQuote("model")," has not been provided")),
          distribution = c("bootstrap", "asymptotic"),
          boot.method = c("iid","wild","wild-rademacher"),
          boot.num = 399,
          pivot = TRUE,
          density.weighted = TRUE,
          random.seed = 42,
          ...)

Value

npcmstest returns an object of type cmstest with the following components, components will contain information related to Jn or In depending on the value of pivot:

Jn: the statistic Jn
In: the statistic In
Omega.hat: as described in Hsiao, C. and Q. Li and J.S. Racine.
q.*: the various quantiles of the statistic Jn (or In if pivot=FALSE) are in components q.90, q.95, q.99 (one-sided 1%, 5%, 10% critical values)
P: the P-value of the statistic
Jn.bootstrap: if pivot=TRUE contains the bootstrap replications of Jn
In.bootstrap: if pivot=FALSE contains the bootstrap replications of In

summary supports object of type cmstest.

Arguments

formula: a symbolic description of variables on which the test is to be performed. The details of constructing a formula are described below.
data: an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.
subset: an optional vector specifying a subset of observations to be used.
model: a model object obtained from a call to lm (or glm). Important: the call to either glm or lm must have the arguments x=TRUE and y=TRUE or npcmstest will not work. Also, the test is based on residual bootstrapping hence the outcome must be continuous (which rules out Logit, Probit, and Count models).
xdat: a \(p\)-variate data frame of explanatory data (training data) used to calculate the regression estimators.
ydat: a one (1) dimensional numeric or integer vector of dependent data, each element \(i\) corresponding to each observation (row) \(i\) of xdat.
distribution: a character string used to specify the method of estimating the distribution of the statistic to be calculated. bootstrap will conduct bootstrapping. asymptotic will use the normal distribution. Defaults to bootstrap.
boot.method: a character string used to specify the bootstrap method. iid will generate independent identically distributed draws. wild will use a wild bootstrap. wild-rademacher will use a wild bootstrap with Rademacher variables. Defaults to iid.
boot.num: an integer value specifying the number of bootstrap replications to use. Defaults to 399.
pivot: a logical value specifying whether the statistic should be normalised such that it approaches \(N(0,1)\) in distribution. Defaults to TRUE.
density.weighted: a logical value specifying whether the statistic should be weighted by the density of xdat. Defaults to TRUE.
random.seed: an integer used to seed R's random number generator. This is to ensure replicability. Defaults to 42.
...: additional arguments supplied to control bandwidth selection on the residuals. One can specify the bandwidth type, kernel types, and so on. To do this, you may specify any of bwscaling, bwtype, ckertype, ckerorder, ukertype, okertype, as described in npregbw. This is necessary if you specify bws as a \(p\)-vector and not a bandwidth object, and you do not desire the default behaviours.

Author

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

Usage Issues

npcmstest supports regression objects generated by lm and uses features specific to objects of type lm hence if you attempt to pass objects of a different type the function cannot be expected to work.

If you are using data of mixed types, then it is advisable to use the data.frame function to construct your input data and not cbind, since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

References

Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.

Hsiao, C. and Q. Li and J.S. Racine (2007), “A consistent model specification test with mixed categorical and continuous data,” Journal of Econometrics, 140, 802-826.

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Maasoumi, E. and J.S. Racine and T. Stengos (2007), “Growth and convergence: a profile of distribution dynamics and mobility,” Journal of Econometrics, 136, 483-508.

Murphy, K. M. and F. Welch (1990), “Empirical age-earnings profiles,” Journal of Labor Economics, 8, 202-229.

Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.

Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.

Examples

Run this code

if (FALSE) {
## Not run in checks: excluded to keep MPI examples stable and check times short.
## The following example is adapted for interactive parallel execution
## in R. Here we spawn 1 slave so that there will be two compute nodes
## (master and slave).  Kindly see the batch examples in the demos
## directory (npRmpi/demos) and study them carefully. Also kindly see
## the more extensive examples in the np package itself. See the npRmpi
## vignette for further details on running parallel np programs via
## vignette("npRmpi",package="npRmpi").

## Start npRmpi for interactive execution. If slaves are already running and
## `options(npRmpi.reuse.slaves=TRUE)` (default on some systems), this will
## reuse the existing pool instead of respawning. To change the number of
## slaves, call `npRmpi.stop(force=TRUE)` then restart.
npRmpi.start(nslaves=1)

mpi.bcast.cmd(data(cps71),
              caller.execute=TRUE)

mpi.bcast.cmd(attach(cps71),
              caller.execute=TRUE)

mpi.bcast.cmd(model <- lm(logwage~age+I(age^2), x=TRUE, y=TRUE),
              caller.execute=TRUE)

mpi.bcast.cmd(npcmstest(model = model, xdat = age, ydat = logwage,
                        boot.num = 29),
              caller.execute=TRUE)

## For the interactive run only we close the slaves perhaps to proceed
## with other examples and so forth. This is redundant in batch mode.

## Note: on some systems (notably macOS+MPICH), repeatedly spawning and
## tearing down slaves in the same R session can lead to hangs/crashes.
## npRmpi may therefore keep slave daemons alive by default and
## `npRmpi.stop()` performs a "soft close". Use `force=TRUE` to
## actually shut down the slaves.
##
## You can disable reuse via `options(npRmpi.reuse.slaves=FALSE)` or by
## setting the environment variable `NP_RMPI_NO_REUSE_SLAVES=1` before
## loading the package.

npRmpi.stop()               ## soft close (may keep slaves alive)
## npRmpi.stop(force=TRUE)  ## hard close

## Note that in order to exit npRmpi properly avoid quit(), and instead
## use mpi.quit() as follows.

## mpi.bcast.cmd(mpi.quit(),
##               caller.execute=TRUE)
}

Run the code above in your browser using DataLab