npmodelcheck: Hypothesis Testing for Covariate or Group effect in Nonparametric Regression

Description

Tests the significance of a covariate or a group of covariates in a nonparametric regression based on residuals from a local polynomial fit of the remaining covariates using high dimensional one-way ANOVA.

Usage

npmodelcheck(X, Y, ind_test, p = 7, degree.pol = 0, kernel.type = "epanech", 
    bandwidth = 0, gridsize = 30, dim.red = c(1, 10))

Arguments

matrix with observations, rows corresponding to data points and columns correspond to covariates.

vector of observed responses.

ind_test

index or vector with indices of covariates to be tested.

size of the window Wi. See Details

degree.pol

degree of the polynomial to be used in the local fit.

kernel.type

kernel type, options are "box", "trun.normal", "gaussian", "epanech", "biweight", "triweight" and "triangular". "trun.normal" is a gaussian kernel truncated between -3 and 3.

bandwidth

bandwidth, vector or matrix of bandwidths for the local polynomial fit. If a vector of bandwidths, it must correspond to each covariate of X_{-(ind_test)}, that is, the covariates not being tested. If 0, leave-one-out cross validation with criterion of mi

gridsize

number of possible bandwidths to be searched in cross-validation. If left as default 0, gridsize is taken to be 5+as.integer(100/d^3). If cross-validation is not performed, it is ignored.

dim.red

vector with first element indicating 1 for Sliced Inverse Regression (SIR) and 2 for Supervised Principal Components (SPC); the second element of the vector should be number of slices (if SIR), or number of principal components (if SPC). If 0, no dimensio

Value

bandwidthbandwidth used for the local polynomial fit
predictedvector with the predicted values with the remaining covariates
p-valuep-value of the test

Details

To test the significance of a single covariate, say X_j, assume that its observations X_{ij}, i = 1,...n, define the factor levels of a one-way ANOVA. To construct the ANOVA, each of these factor levels is augmented by including residuals from nearby covariate values. Specifically, cell "i" is augmented by the values of the residuals corresponding to observations X_{ij} for "i" in W_i (W_i defines the neighborhood, and has size "p"). These residuals are obtained from a local polynomial fit of the remaining covariates X_{-(j)}. Then, the test for the significance of X_j is the test for no factor effects in the high-dimensional one-way ANOVA. See references for further details. When testing the significance of a group of covariates, the window W_i is defined using the fist supervised principal component (SPC) of the covariates in that group; and the local polynomial fit uses the remaining covariates X_{-(ind_test)}.

Dimension reduction (SIR or SPC) is applied on the remaining covariates (X_{-(ind_test)}), which are used on the local polynomial fit. This reduction is used to moderate the effect of the curse of dimensionality when fitting nonparametric regression for several covariates. For SPC, the supervision is done in the following way: only covariates with p-values (from univariate "npmodelcheck" test with Y) < 0.3 can be selected to compose the principal components. If no covariate has p-value < 0.3, then the most significant covariate will be the only component. For SIR, the size of the effective dimension reduction space is selected automatically through sequential testing (see references for details).

References

Zambom, A. Z. and Akritas, M. G. (2012). a) Nonparametric Model Checking and Variable Selection. arXiv 1205.6761.

Zambom, A. Z. and Akritas, M. G. (2012). b) Signicance Testing and Group Variable Selection. arXiv 1205.6843.

Li, K. C. (1991). Sliced Inverse Regression for Dimension Reduction. Journal of the American Statistical Association, 86, 316-327.

Bair E., Hastie T., Paul D. and Tibshirani R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101, 119-137.

Examples

Run this code

X = matrix(1,100,5)

X[,1] = rnorm(100)
X[,2] = rnorm(100)
X[,3] = rnorm(100)
X[,4] = rnorm(100)
X[,5] = rnorm(100)
Y = X[,3]^3 + rnorm(100)

npmodelcheck(X, Y, 2, p = 9, degree.pol = 0, kernel.type = "trun.normal", 
bandwidth = -1,  dim.red = 0)

npmodelcheck(X, Y, 3, p = 7, degree.pol = 0, kernel.type = "trun.normal", 
bandwidth = 0,  dim.red = c(2,2))

npmodelcheck(X, Y, c(1,2), p = 11, degree.pol = 0, kernel.type = "box", 
bandwidth = 0,  dim.red = c(1,10))

npmodelcheck(X, Y, c(3,4), p = 5, degree.pol = 0, kernel.type = "box", 
bandwidth = 0,  dim.red = c(1,20))

npmodelcheck(rnorm(100), rnorm(100), 1, p = 5, degree.pol = 1, kernel.type = "box", 
bandwidth = 0,  dim.red = c(1,20))

Run the code above in your browser using DataLab