intRegGOF: Integrated Regression Goodness of Fit

Description

Integrated Regression Goodness of Fit to test if a given model is suitable to represent the regression function for a given data.

Usage

intRegGOF(obj, covars = NULL, B = 499, LINMOD = FALSE)
  # S3 method for intRegGOF
print(x,...)

Arguments

obj

An object of class lm, glm or nls.

covars

Names of continuous (numerical) variates used to compute Integrated Regression. They should be variables contained

in the data frame used to compute the regression fit.

Bootstrap resampling size.

LINMOD

When TRUE and if obj is an object of class lm Linear Model matrix fitting equations are used.

An object of class intRegGOF.

…

Further parameters for print command.

Value

This function returns an object of class intRegGOF, a list which cointains following objects:

call

The call to the function

regObj

String with the lm, glm or nls object whose fit is cheked

regModel

lm, glm or nls object call.

p.value

$p$--values for $K_n$ and $W^2_n$ statistics.

datStat

value of $K_n$ and $W^2_n$ statistics.

covars

continuous (numerical) variates used to compute Integrated Regression.

intErr

cumulated residual process at the values of covars in data.

xLT

structure with the order of covars summation.

bootSamp

Bootstrap samples for $K_n$ and $W^2_n$.

Details

The Integrated Regression Goodness of Fit technique is introduce in Stute(1997). The main idea is to study the process that results from the cumulation of the residuals up to a given value of the covariates. Once this process is built, different functional over it can be considered to measure the discrepany between the true regression function and its estimation.

The tests that implements this function is $$ H_0:m\in M \ \textrm{vs} \ H_1:m\notin M $$ being $m$ the regression function, and $M$ a given class of functions. The statistics considered are $$ K_n=\sup_{x\in R^d}|R^w_n(x)| $$ $$ W^2_n=\int_{R^d}R^w_n(z)^2 \,dF(z). $$ where $R^w_n(z)$ is the cumulated residual process: $$ R^w_n(x)=n^{-1/2}\sum^n_{i=1}(y_i-\hat y_i)I(x_i\le x). $$

As the stochastic behaviour of this cumulated residual process is quite complex, the implementation of the technique is based on resampling techniques. In particular the chosen implementation is based on Wild Bootstrap methods.

The method also handles selection biased data by means of compensation, by means of the weights used to fit the resgression function when computing the cumulated residual process.

At the moment only 'response' type of residuals are considered, jointly with wild bootstrap resampling technique and the result for discrete responses might no be proper.

References

Stute, W. (1997). Nonparametric model checks for regression. Ann. Statist., 25(2), pp. 613--641.

Ojeda, J. L., W. Gonz<U+00E1>lez-Manteiga W. and Crist<U+00F3>bal, J. A A bootstrap based Model Checking for Selection--Biased data Reports in Statistics and Operations Research, U. de Santiago de Compostela. Report 07-05 http://eio.usc.es/eipc1/BASE/BASEMASTER/FORMULARIOS-PHP-DPTO/REPORTS/447report07_05.pdf

Ojeda, J. L., Crist<U+00F3>bal, J. A., and Alcal<U+00E1>, J. T. (2008). A bootstrap approach to model checking for linear models under length-biased data. Ann. Inst. Statist. Math., 60(3), pp. 519--543.

Examples

Run this code

# NOT RUN {
n <- 50
d <- data.frame( X1=runif(n),X2=runif(n))
d$Y <- 1 + 2*d$X1 + rnorm(n,sd=.125)
plot( d ) 
intRegGOF(lm(Y~X1+X2,d),B=99)
intRegGOF(a <- lm(Y~X1-1,d),B=99) 
intRegGOF(a,c("X1","X2"),B=99) 
intRegGOF(a,~X2+X1,B=99) 
# }

Run the code above in your browser using DataLab