dr: Main function for dimension reduction regression

Description

This is the main function in the dr package. It creates objects of class dr to estimate the central (mean) subspace and perform tests concerning its dimension. Several helper functions that require a dr object can then be applied to the output from this function.

Usage

dr (formula, data, subset, group=NULL, na.action = na.fail, weights, ...)
    
dr.compute (x, y, weights, group=NULL, method = "sir", chi2approx="bx",...)

Arguments

formula

a two-sided formula like y~x1+x2+x3, where the left-side variable is a vector or a matrix of the response variable(s), and the right-hand side variables represent the predictors. While any legal formula in the Rogers-Wilkinson notation

data

an optional data frame containing the variables in the model. By default the variables are taken from the environment from which `dr' is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

group

If used, this argument specifies a grouping variable so that dimension reduction is done separately for each distinct level. This is implemented only when method is one of "sir", "save", or "ire"

weights

an optional vector of weights to be used where appropriate. In the context of dimension reduction methods, weights are used to obtain elliptical symmetry, not constant variance.

na.action

a function which indicates what should happen when the data contain `NA's. The default is `na.fail,' which will stop calculations. The option 'na.omit' is also permitted, but it may not work correctly when weights are use

The design matrix. This will be computed from the formula by dr and then passed to dr.compute, or you can create it yourself.

The response vector or matrix

method

This character string specifies the method of fitting. The options include "sir", "save", "phdy", "phdres" and "ire". Each method may have its own additional arguments, or its own d

chi2approx

Several dr methods compute significance levels using statistics that are asymptotically distributed as a linear combination of $\chi^2(1)$ random variables. This keyword chooses the method for computing the chi2approx, either "bx

...

For dr, all additional arguments passed to dr.compute. For dr.compute, additional arguments may be required for particular dimension reduction method. For example, nslices is

Value

dr returns an object that inherits from dr (the name of the type is the value of the method argument), with attributes:
xThe design matrix
yThe response vector
weightsThe weights used, normalized to add to n.
qrQR factorization of x.
casesNumber of cases used.
callThe initial call to dr.
MA matrix that depends on the method of computing. The column space of M should be close to the central subspace.
evaluesThe eigenvalues of M (or squared singular values if M is not symmetric).
evectorsThe eigenvectors of M (or of M'M if M is not square and symmetric) ordered according to the eigenvalues.
chi2approxValue of the input argument of this name.
numdirThe maximum number of directions to be found. The output value of numdir may be smaller than the input value.
slice.infooutput from 'sir.slice', used by sir and save.
methodthe dimension reduction method used.
termssame as terms attribute in lm or glm. Needed to make update work correctly.
AIf method="save", then A is a three dimensional array needed to compute test statistics.

Details

The general regression problem studies $F(y|x)$, the conditional distribution of a response $y$ given a set of predictors $x$. This function provides methods for estimating the dimension and central subspace of a general regression problem. That is, we want to find a $p \times d$ matrix $B$ of minimal rank $d$ such that $$F(y|x)=F(y|B'x)$$ Both the dimension $d$ and the subspace $R(B)$ are unknown. These methods make few assumptions. Many methods are based on the inverse distribution, $F(x|y)$. For the methods "sir", "save", "phdy" and "phdres", a kernel matrix $M$ is estimated such that the column space of $M$ should be close to the central subspace $R(B)$. The eigenvectors corresponding to the d largest eigenvalues of $M$ provide an estimate of $R(B)$. For the method "ire", subspaces are estimated by minimizing an objective function. Categorical predictors can be included using the groups argument, with the methods "sir", "save" and "ire", using the ideas from Chiaromonte, Cook and Li (2002). The primary output from this method is (1) a set of vectors whose span estimates R(B); and various tests concerning the dimension d. Weights can be used, essentially to specify the relative frequency of each case in the data. Empirical weights that make the contours of the weighted sample closer to elliptical can be computed using dr.weights. This will usually result in zero weight for some cases. The function will set zero estimated weights to missing.

References

Bentler, P. M. and Xie, J. (2000), Corrections to test statistics in principal Hessian directions. Statistics and Probability Letters, 47, 381-389. Approximate p-values. Cook, R. D. (1998). Regression Graphics. New York: Wiley. This book provides the basic results for dimension reduction methods, including detailed discussion of the methods "sir", "phdy" and "phdres". Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Annals of Statistics, 32, 1062-1092. Introduced marginal coordinate tests. Cook, R. D. and Nachtsheim, C. (1994), Reweighting to achieve elliptically contoured predictors in regression. Journal of the American Statistical Association, 89, 592--599. Describes the weighting scheme used by dr.weights. Cook, R. D. and Ni, L. (2004). Sufficient dimension reduction via inverse regression: A minimum discrrepancy approach, Journal of the American Statistical Association, 100, 410-428. The "ire" is described in this paper. Cook, R. D. and Weisberg, S. (1999). Applied Regression Including Computing and Graphics, New York: Wiley, http://www.stat.umn.edu/arc. The program arc described in this book also computes most of the dimension reduction methods described here. Chiaromonte, F., Cook, R. D. and Li, B. (2002). Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 30 475-497. Introduced grouping, or conditioning on factors. Shao, Y., Cook, R. D. and Weisberg (2007). Marginal tests with sliced average variance estimation. Biometrika. Describes the tests used for "save". Wen, X. and Cook, R. D. (2007). Optimal Sufficient Dimension Reduction in Regressions with Categorical Predictors, Journal of Statistical Inference and Planning. This paper extends the "ire" method to grouping. Wood, A. T. A. (1989) An $F$ approximation to the distribution of a linear combination of chi-squared variables. Communications in Statistics: Simulation and Computation, 18, 1439-1456. Approximations for p-values.

Examples

Run this code

data(ais)
# default fitting method is "sir"
s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+
  log(Hc)+log(Ferr),data=ais) 
# Refit, using a different function for slicing to agree with arc.
summary(s1 <- update(s0,slice.function=dr.slices.arc))
# Refit again, using save, with 10 slices; the default is max(8,ncol+3)
summary(s2<-update(s1,nslices=10,method="save"))
# Refit, using phdres.  Tests are different for phd, and not
# Fit using phdres; output is similar for phdy, but tests are not justifiable. 
summary(s3<- update(s1,method="phdres"))
# fit using ire:
summary(s4 <- update(s1,method="ire"))
# fit using Sex as a grouping variable.  
s5 <- update(s4,group=~Sex)

Run the code above in your browser using DataLab