# ivreg: Two-Stage Least-Squares Regression with Diagnostics

`knitr::opts_chunk$set(echo = TRUE, fig.height=4, fig.width=4)`

## Overview

The **ivreg** package provides a comprehensive implementation of instrumental variables
regression using two-stage least-squares (2SLS) estimation. The standard
regression functionality (parameter estimation, inference, robust covariances,
predictions, etc.) is derived from and supersedes the `ivreg()`

function in the
**AER** package. Additionally, various
regression diagnostics are supported, including hat values, deletion diagnostics such
as studentized residuals and Cook's distances; graphical diagnostics such as
component-plus-residual plots and added-variable plots; and effect plots with partial
residuals.

In order to provide all of this functionality the **ivreg** package integrates
seamlessly with other packages by providing suitable S3 methods, specifically for generic functions in the
base-R **stats** package, and in the
**car**,
**effects**,
**lmtest**, and
**sandwich** packages,
among others.

The package is accompanied by two online vignettes, namely this introduction and an article introducing the regression diagnostics and graphics:

## Installation

The stable release version of **ivreg**
is hosted on the Comprehensive R Archive Network
(CRAN) at https://CRAN.R-project.org/package=ivreg and can be installed along with all
dependencies via

`install.packages("ivreg", dependencies = TRUE)`

The development version of **ivreg** is hosted on GitHub at https://github.com/john-d-fox/ivreg/.
It can be conveniently installed installed via the `install_github()`

function in the
**remotes** package:

`remotes::install_github("https://github.com/john-d-fox/ivreg/")`

## Instrumental variables regression

The main function in the **ivreg** package is `ivreg()`

, which is a high-level
formula interface to the work-horse `ivreg.fit()`

function; both functions return
a list of quantities similar to that returned by `lm()`

(including coefficients, coefficient variance-covariance
matrix, residuals, etc.). In the case of `ivreg()`

, the returned list is of class `"ivreg"`

,
for which a wide range of standard methods is available, including `print()`

,
`summary()`

, `coef()`

, `vcov()`

, `anova()`

, `predict()`

, `residuals()`

, `terms()`

,
`model.matrix()`

, `formula()`

, `update()`

, `hatvalues()`

, `dfbeta()`

, and `rstudent()`

.
Moreover, methods for functionality from other packages is provided,
and is described in more detail in a companion vignette.

Regressors and instruments for `ivreg()`

are most easily specified in a
formula with two parts on the right-hand side, for example, `y ~ x1 + x2 | x1 + z1 + z2`

,
where `x1`

and `x2`

are, repectively, exogenous and endogenous explanatory variables, and `x1`

, `z1`

, and `z2`

are
instrumental variables. Both components on the right-hand side of the model formula include
an implied intercept, unless, as in a linear model estimated by `lm()`

, the intercept is
explicitly excluded via `-1`

. Exogenous explanatory variables, such as `x1`

in the example,
must be included among the instruments. A worked example is described immediately below.

## Illustration: Returns to schooling

As an initial demonstration of the **ivreg** package, we investigate
the effect of schooling on earnings in a classical model for wage determination.
The data are from the United States, and are provided in the package as
`SchoolingReturns`

. This data set was originally studied by David Card, and was subsequently
employed, as here, to illustrate 2SLS estimation in introductory econometrics textbooks. The relevant variables for this
illustration are:

```
data("SchoolingReturns", package = "ivreg")
summary(SchoolingReturns[, 1:8])
```

A standard wage equation uses a semi-logarithmic linear regression for `wage`

, estimated by
ordinary least squares (OLS), with years of `education`

as the primary explanatory variable,
adjusting for a quadratic term in labor-market `experience`

, as well as for factors
coding `ethnicity`

, residence in a city (`smsa`

), and residence in the U.S. `south`

:

```
m_ols <- lm(log(wage) ~ education + poly(experience, 2) + ethnicity + smsa + south,
data = SchoolingReturns)
summary(m_ols)
```

Thus, OLS estimation yields an estimate of `r round(100 * coef(m_ols)["education"], digits = 1)`

%
per year for returns to schooling. This estimate is problematic, however, because it can be argued
that `education`

is endogenous (and hence also `experience`

, which is taken to be `age`

minus
`education`

minus 6). We therefore use geographical proximity to a college when growing
up as an exogenous instrument for `education`

. Additionally, `age`

is the natural
exogenous instrument for `experience`

, while the remaining explanatory variables can be considered
exogenous and are thus used as instruments for themselves.
Although it's a useful strategy to select an effective instrument or instruments for each endogenous
explanatory variable, in 2SLS regression all of the instrumental variables are used to estimate all
of the regression coefficients in the model.

To fit this model with `ivreg()`

we can simply extend the formula from `lm()`

above, adding a second part after the `|`

separator to specify the instrumental variables:

```
library("ivreg")
m_iv <- ivreg(log(wage) ~ education + poly(experience, 2) + ethnicity + smsa + south |
nearcollege + poly(age, 2) + ethnicity + smsa + south,
data = SchoolingReturns)
summary(m_iv)
```

Thus, using two-stage least squares to estimate the regression yields a much larger
coefficient for the returns to schooling, namely `r round(100 * coef(m_iv)["education"], digits = 1)`

% per year.
Notice as well that the standard errors of the coefficients are larger for 2SLS estimation
than for OLS, and that, partly as a consequence, evidence for the effects of `ethnicity`

and the quadratic component of `experience`

is now weak. These differences are brought
out more clearly when using the `compareCoefs()`

function from the **car** package:

`car::compareCoefs(m_ols, m_iv)`