correct.signs: Normal-Theory Maximum Likelihood Estimation of Beta Coefficients with "Correct" Signs

Description

Obenchain(1978) discussed the risk of linear generalized ridge estimators in individual directions within p-dimensional X-space. While shrinkage to ZERO is clearly optimal for all directions strictly ORTHOGONAL to the true BETA, he showed that optimal shrinkage in the UNKNOWN direction PARALLEL to the true BETA is possible. This optimal BETA estimate is of the form k * X'y, where k is the positive scalar given in equation (4.2), page 1118. The correct.signs() function computes this estimate, B(=), that uses GRR delta-shrinkage factors proportional to X-eigenvalues.

Usage

correct.signs(form, data)

Arguments

form

A regression formula [y~x1+x2+...] suitable for use with lm().

data

Data frame containing observations on all variables in the formula.

Value

An output list object of class "correct.signs":

data

Name of the data.frame object specified as the second argument.

form

The regression formula specified as the first argument.

Number of regression predictor variables.

Number of complete observations after removal of all missing values.

Numerical value of R-square goodness-of-fit statistic.

Numerical value of the residual mean square estimate of error.

prinstat

Listing of principal statistics (p by 5) from qm.ridge().

kpb

Maximum likelihood estimate of k-factor in equation (4.2) of Obenchain(1978).

bmf

Rescaling factor for B(=) to minimize the Residual Sum-of-Squares.

signs

Listing of five Beta coefficient statistics (p by 5): OLS, X'y, Delta, B(=) and Bfit.

loff

Lack-of-Fit statistics: Residual Sum-of-Squares for OLS, X'y, B(=) and Bfit.

sqcor

Squared Correlation between the y-vector and its predicted values. The two values displayed are for OLS predictions or for predictions using Bfit, X'y or B(=). These two values are the familiar R^2 coefficients of determination for OLS and Bfit.

Details

Ill-conditioned (nearly multi-collinear) regression models can produce Ordinary Least Squares estimates with numerical signs that differ from those of the X'y vector. This is disturbing because X'y contains the sample correlations between the X-predictor variables and y-response variable. After all, these variables have been "centered" by subtracting off their mean values and rescaled to vectors of length one. Besides displaying OLS estimates, the correct.signs() function also displays the "correlation form" of X'y, the estimated delta-shrinkage factors, and the k-rescaled beta-coefficients. Finally, the "Bfit" vector of estimates proportional to B(=) is displayed that minimizes the restricted Residual Sum-of-Squares. This restricted RSS of Bfit cannot, of course, be less than the RSS of OLS, but it can be MUCH less that the RSS of B(=) whenever B(=) shrinkage appears excessive.

References

Obenchain RL. (1978) Good and Optimal Ridge Estimators. Annals of Statistics 6, 1111-1121. <doi:10.1214/aos/1176344314>

Obenchain RL. (2005) Shrinkage Regression: ridge, BLUP, Bayes, spline and Stein. Electronic book-in-progress (185+ pages.) http://localcontrolstatistics.org

Obenchain RL. (2020) RXshrink_in_R.PDF RXshrink package vignette-like file. http://localcontrolstatistics.org

Examples

Run this code

# NOT RUN {
  data(longley2)
  form <- GNP~GNP.deflator+Unemployed+Armed.Forces+Population+Year+Employed
  rxcsobj <- correct.signs(form, data=longley2)
  rxcsobj
  str(rxcsobj)
# }

Run the code above in your browser using DataLab