Diagnostics for 2SLS Regression

Nonlinearity Diagnostics

The theoretical properties of component-plus-residual plots as nonlinearity diagnostics were systematically explored by @Cook1993 and @CookCroosDabrera1998. Following these authors and focusing on the explanatory variable $x_1$, let's assume that the partial relationship of the response $y$ to $x_1$ is potentially nonlinear, as represented by the partial regression function $f(x_1)$, and that the partial relationships of $y$ to the other $x$s are linear, so that an accurate model for the data is: $$E(y) = \alpha + f(x_1) + \beta_2 x_2 + \cdots + \beta_k x_k$$

We don't know $f()$ and so instead fit the working model $$E(y) = \alpha^\prime + \beta_1^\prime x_1 + \beta_2^\prime x_2 + \cdots + \beta_k^\prime x_k$$ in our case by 2SLS regression, obtaining estimated regression coefficients $a^\prime, b_1^\prime, b_2^\prime, \ldots, b_k^\prime$. Cook and Croos-Dabrera's work shows that as long as the regression estimator is consistant and the $x$s are linearly related, the partial residuals $b^\prime_1 x_1 + e$ can be plotted and smoothed against $x_1$ to visualize an estimate of $f()$, where $e = y - (a^\prime + b_1^\prime x_1 + b_2^\prime x_2 + \cdots b_k^\prime x_k)$ are the response residuals. In practice, the component-plus-residual plot can break down as an accurate representation of $f()$ if there are strong nonlinear relationships between $x_1$ and the other $x$s or if $y$ is nonlinearly related to another $x$ that is correlated with $x_1$.

@FoxWeisberg2018 extend component-plus-residual plots to more complex regression models, which can, for example, include interactions, by adding partial residuals to predictor effect plots. These graphs also can be applied to linear models fit by 2SLS regression.

Diagnosing Nonlinearity: An Example

We turn once more to the demand equation for Kmenta's data and model to illustrate component-plus-residual plots, and once more the data are well behaved. An "ivreg" method is provided for the crPlot() function in the car package. In particular, crPlots() constructs component-plus-residual plots for all of the numeric explanatory variables in an additive regression equation. For example,

crPlots(deq, smooth=list(span=1))

We set a large span for the loess smoother [@ClevelandGrosseShyu1992] in the plot because there are only $n = 20$ cases in the Kmenta data set. The default value of the span is $2/3$. In each panel, the loess smooth, given by the magenta line, closely matches the least-squares line, given by the broken blue line, which represents the fitted regression plane viewed edge-on in the direction of the focal explanatory variable, $P$ on the left and $D$ on the right. Both partial relationships therefore appear to be linear.

CERES plots [@Cook1993], implemented in the ceresPlots() function in the car package, are a version of component-plus-residuals plots that use smoothers rather than linear regression and that therefore are more robust with respect to nonlinear relationships among the predictors. In most applications, component-plus-residuals and CERES plots produce similar results, and that's the case here:

ceresPlots(deq, smooth=list(span=1))

crPlots() and ceresPlots() work only for additive models; the predictorEffects() function in the effects package plots partial residuals for more complex models. In the current example, which is an additive model, we get essentially the same graphs as before, except for the scaling of the $y$ axis:

library("effects") plot(predictorEffects(deq, residuals=TRUE), partial.residuals=list(span=1))

The shaded blue regions in the predictor effect plots represent pointwise 95% confidence envelopes around the fitted partial-regression lines.

Suppose, however, that we fit the wrong model to the data:

deq2 <- update(deq, . ~ I((P - 85)^4/10^5) + D) crPlots(deq2, smooth=list(span=1))

Because the ratio $\max(P)/\min(P) = 113.49/86.50 = 1.3$ is not much larger than 1, we subtracted a number slightly smaller than $\min(P)$ from $P$ prior to raising the variable to the 4th power to induce substantial nonlinearity into the fitted partial regression curve. The resulting component-plus-residual plot for the transformed $P$ clearly reflects the resulting lack of fit, while the plot for $D$ is still reasonably linear.

Predictor effect plots with partial residuals show a different view of the same situation by placing P rather than the transformed P on the horizontal axis, and revealing that the fitted nonlinear partial regression function fails to capture the linear pattern of the data:

plot(predictorEffects(deq2, residuals=TRUE), partial.residuals=list(span=1))

Recall that the blue lines represent the fitted model and the magenta lines are for the smoothed partial residuals; discrepancy between the two lines is indicative of lack of fit.

Nonconstant Error Variance

Standard least-squares nonconstant variance ("heteroscedasticity") diagnostics extend straightforwardly to 2SLS regression. We can, for example, plot studentized residuals versus fitted values to discern a tendency for the variability of the former to change (typically to increase) with the level of the latter. For the demand equation in Kmenta's model,

plot(fitted(deq), rstudent(deq)) abline(h=0)

which seems unproblematic.

A variation of this graph, suggested by @Fox2016, adapts Tukey's spread-level plot [@Tukey1977] to graph the log of the absolute studentized residuals versus the log of the fitted values, assuming that the latter are positive. If a line fit to the plot has slope $b$, then a variance-stablilizing power transformation is given by $y^\lambda = y^{1 - b}$. Thus if $b > 0$, the suggested transformation is down Tukey's ladder of powers and roots, with, for example, $\lambda = 1 - b = 1/2$ representing the square-root transformation, $\lambda = 1 - b = 0$ the log transformation, and so on. For Kmenta's model, we have

spreadLevelPlot(deq, smooth=list(span=1))

which suggests a slight tendency of spread to increase with level. The transformation $\lambda = - 2.45$ seems strong, until we notice that the values of $Q$ are far from 0, and that the ratio of the largest to smallest values $Q{\mathrm{max}}/Q{\mathrm{min}} = 106.23/92.42 = 1.15$ is close to 1, so that $Q^{-2.45}$ is nearly a linear transformation of $Q$---that is, effectively no transformation at all:

with(Kmenta, plot(Q, Q^2.5)) abline(lm(Q^2.5 ~ Q, data=Kmenta))

A common score test for nonconstant error variance in least-squares regression, suggested by @BreuschPagan1979, is based on the model $$V(\varepsilon) = g(\gamma_0 + \gamma_1 z_1 + \cdots + \gamma_s z_s)$$ where the function $g()$ is unspecified and the variables $z_1, \ldots, z_s$ are predictors of the error variance. In the most common application, independently proposed by @CookWeisberg1983, there is one $z$, the fitted values $\widehat{y}$ from the regression, although it is also common to use the regressors $x$ from the primary regression as $z$s. The test is implemented by regressing the squared standardized residuals $e_i^2/\widehat{\sigma}^2$ on the $z$s, where $\widehat{\sigma}^2 = \sum e_i^2/n$. The regression sum of squares for this auxiliary regression divided by 2 is then asymptotically distributed as $\chi^2_s$ under the null hypothesis of constant error variance.

The Breusch-Pagan/Cook-Weisberg test is easily adaptable to 2SLS regression, as implemented by the ncvTest() function in the car package. For Kmenta's demand equation:

ncvTest(deq) ncvTest(deq, var = ~ P + D)

Here, the first test is against the fitted values and the second more general test is against the explanatory variables in the demand equation; the $p$-values for both tests are large, suggesting little evidence against the hypothesis of constant variance.

Remedies for nonconstant variance in 2SLS regression are similar to those in least-squares regression:

• We've already suggested that if the error variance increases (or decreases) with the level of the response, and if the response is positive, then we might be able to stabilize the error variance by power-transforming the response.

• If, alternatively, we know the variance of the errors up to a constant of proportionality, then we can use inverse-variance weights for the 2SLS estimator. The ivreg() function supports weighted 2SLS regression, and the diagnostics in the ivreg package work with weighted 2SLS fits (see the next section).

• Finally, we can employ a "sandwich" estimator of the coefficient covariance matrix in 2SLS [or the bootstrap: see, e.g., @DavisonHinkley1997] to correct standard errors for nonconstant error variance, much as in least-squares regression as proposed by @Huber1967 and @White1980 [also see @LongErvin2000].

The ivreg package supports the sandwich() function in the sandwich package [@Zeileis2006]. For the Kmenta example, where evidence of nonconstant error variance is slight, the sandwich standard errors are similar to, indeed slightly smaller than, the conventional 2SLS standard errors:

summary(deq, vcov=sandwich::sandwich) SEs <- round(cbind(sqrt(diag(sandwich::sandwich(deq))), sqrt(diag(vcov(deq)))), 4) colnames(SEs) <- c("sandwich", "conventional") SEs

We'll modify Kmenta's data to reflect nonconstant error variance, regenerating the data as Kmenta did originally from the reduced-form equations, expressing the endogenous variables $P$ and $Q$ as functions of the exogenous variables $D$, $F$, and $A$, and reduced-form errors $\nu_1$ and $\nu_2$:

Kmenta2 <- Kmenta[, c("D", "F", "A")] set.seed(492365) # for reproducibility Kmenta2 <- within(Kmenta2, { EQ <- 75.25 + 0.1125*D + 0.1250*F + 0.225*A EP <- 85.00 + 0.7500*D - 0.5000*F - 0.900*A d1 <- rnorm(20) d2 <- rnorm(20) v1 <- 2*d1 v2 <- -0.5*v1 + d2 w <- 3*(EQ - min(EQ) + 0.1)/(max(EQ) - min(EQ)) v1 <- v1*w # inducing nonconstant variance Q <- EQ + v1 P <- EP + v2 })

Plotting the sampled reduced-form errors v1 against the expectation of Q shows a clear heterscedastic pattern:

with(Kmenta2, plot(EQ, v1))

Then refitting the demand equation to the new data set, we get

deq2 <- update(deq, data=Kmenta2) summary(deq2)

and the nonconstant error variance is clearly reflected in diagnostics; for example,

spreadLevelPlot(deq2)
ncvTest(deq2)

The extreme value of the suggested power transformation of $Q$ from the spread-level plot, $\lambda =-23$, reflects (as we noted previously) the fact that $\max(Q)/\min(Q)$ isn't much larger than 1.

In our example, the sandwich standard errors are not very different from the conventional standard errors:

SEs2 <- round(cbind(sqrt(diag(sandwich::sandwich(deq2))), sqrt(diag(vcov(deq2)))), 4) colnames(SEs2) <- c("sandwich", "conventional") SEs2

As mentioned, bootstrapping provides an alternative to sandwich standard errors as a correction for nonconstant error variance, and the ivreg package supplies an "ivreg" method for the Boot() function in the car package, implementing the case-resampling bootstrap, and returning an object of class "boot" suitable for use with functions in the boot package [@DavisonHinkley1997; @boot]. By default, $R = 999$ bootstrap replications are generated. For example:

set.seed <- 869255 # for reproducibility b.deq2 <- Boot(deq2) summary(deq2, vcov.=vcov(b.deq2))

The bootstrap standard errors are larger than the conventional or sandwich standard errors for this example.

Boostrap confidence intervals can also be computed from the object returned by Boot(), by default reporting $BC_a$ (bias-corrected, accelerated) intervals (see the documentation for confint.boot() in the car package):

confint(b.deq2)

Weighted 2SLS Regession

Suppose that we modify the regression model $y = X \beta + \varepsilon$ so that now $N_n(0, \sigma^2 W^{-1})$ where $W = \mathrm{diag}{w_i}$ is an $n \times n$ diagonal matrix of known inverse-variance weights; that is $V(\varepsilon_i) = \sigma^2/w_i$. As before, some of the columns of $X$ may be correlated with the errors $\varepsilon$, but we have sufficient instrumental variables $Z$ that are independent of the errors.

Then the weighted 2SLS estimator is $$b_{\mathrm{W2SLS}} = [X^\top W Z(Z^\top W Z)^{-1} Z^\top W X]^{-1} X^\top W Z (Z^\top W Z)^{-1} Z^\top W y$$ Alternatively, we can treat the two stages of 2SLS as weighted least squares (WLS) problems, in each stage minimizing the weighted sum of squared residuals. The ivreg() function computes the weighted 2SLS estimator in this manner.

Phillips's updating formulas for 2SLS regression could also be modified for the weighted case, but a simpler approach (which is evident in the formula for $b_{\mathrm{W2SLS}}$ above) is to convert the weighted 2SLS problem into an unweighted problem, by transforming the data to constant variance using $W^{1/2} = \mathrm{diag}{\sqrt{w_i}}$, the Cholesky square root of $W$. The square root of $W$ is particularly simple because $W$ is diagonal. Then in Phillips's updating formulas, we replace $y$ with $y^ = W^{1/2}y$, $X$ with $X^ = W^{1/2}X$, and $Z$ with $Z^* = W^{1/2}Z$.

For the modified Kmenta2 data, we know that the variances of the errors in the demand equation are inversely proportional to the variable w. This is of course artificial knowledge, reflecting the manner in which the data were constructed. The weighted 2SLS estimator is therefore computed as

deqw <- update(deq, data=Kmenta2, weights=1/w) summary(deqw)

Plotting studentized residuals against fitted values and testing for nonconstant error variance don't indicate a heteroscedasticity problem, but there is a relatively large studentized residual of about $-3$ that stands out somewhat from the other values:

ncvTest(deqw) plot(fitted(deqw), rstudent(deqw))

A Bonferroni outlier test suggests that the studentized residual isn't unusually large, and once more we're in the unusual situation of knowing that the model is correct.

outlierTest(deqw)

Collinearity Diagnostics

In addition to unusual-data diagnostics, @BelsleyKuhWelsch1980 briefly extend their approach to collinearity diagnostics to 2SLS regression. We believe that this approach, which assimilates collinearity to numerical instability, is flawed, in that it takes into account "collinearity with the intercept." That is, regressors with values far from 0 have large sums of products with the constant regressor, producing a large standard error of the intercept, and simply reflecting the fact that the intercept extrapolates the fitted regression surface far beyond the range of the data.

@FoxMonette1992 describe an alternative approach to collinearity diagnostics in linear models fit by least squares based on generalized variance-inflation factors. The implementation of generalized variance inflation fators in the vif() function in the car package, which employs the estimated covariance matrix of the coefficients, applies in general to models with linear predictors, including linear models estimated by 2SLS.

For example, for the demand equation in Kmenta's model:

sqrt(vif(deq))

Taking the square-roots of the VIFs puts them on the coefficient standard-error scale. That is, the standard errors of the coefficients of $P$ and $D$ are 23% larger than they would be if the estimated coefficients were uncorrelated (which is equivalent to the columns of $\widehat{X}$ for $P$ and $D$ in the second-stage regression being uncorrelated). When, as here, each term in the model has just one coefficient, generalized and ordinary variance-inflation factors coincide. The equality of the VIFs for $P$ and $D$ is peculiar to the case of two regressors (beyond the regression constant).

Marginal/conditional plots, produced by the mcPlots() function in the car package, superimpose the added-variable plots on corresponding marginal scatterplots for the regressors. These graphs allow us to visualize the reduction in precision of estimation of each coefficient due to collinearity, which reduces the conditional variation of a regressor relative to its marginal variation. for example, for the demand equation:

mcPlots(deq)

The blue points in each panel represent the marginal scatterplot and the magenta points represent the (partial) added-variable plot, with the arrows showing the relationship between the two sets of points.

Concluding Remarks

Careful regression analysis requires methods for looking effectively at the data. Many potential problems can be addressed by examining the data prior to fitting a regression model, decreasing (if not eliminating) the necessity for post-fit diagnostics. No doubt careful data analysts employing 2SLS have always done this. Nevertheless, having methods that allow one to subject a regression model fit by 2SLS to criticism will in at least some cases suggest improvements to the model or perhaps corrections to the data.