# Estimation of error components models with the plm function

```
library("knitr")
opts_chunk$set(message = FALSE, warning = FALSE)
```

```
library("texreg")
extract.plm <- function(model, include.rsquared = TRUE, include.adjrs = TRUE,
include.nobs = TRUE, include.ercomp = TRUE, ...) {
s <- summary(model, ...)
coefficient.names <- rownames(coef(s))
coefficients <- coef(s)[, 1]
standard.errors <- coef(s)[, 2]
significance <- coef(s)[, 4]
rs <- s$r.squared[1]
adj <- s$r.squared[2]
n <- length(model$residuals)
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.ercomp == TRUE){
if (model$args$model == "random"){
se <- sqrt(ercomp(model)$sigma)
gof <- c(gof, se)
gof.names <- c(gof.names, paste("s_", names(se), sep = ""))
gof.decimal <- c(gof.decimal, rep(TRUE, length(se)))
}
}
if (include.rsquared == TRUE) {
gof <- c(gof, rs)
gof.names <- c(gof.names, "R$^2$")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.adjrs == TRUE) {
gof <- c(gof, adj)
gof.names <- c(gof.names, "Adj.\ R$^2$")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.nobs == TRUE) {
gof <- c(gof, n)
gof.names <- c(gof.names, "Num.\ obs.")
gof.decimal <- c(gof.decimal, FALSE)
}
tr <- createTexreg(
coef.names = coefficient.names,
coef = coefficients,
se = standard.errors,
pvalues = significance,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
setMethod("extract", signature = className("plm", "plm"),
definition = extract.plm)
```

`plm`

is a very versatile function which enable the estimation of a
wide range of error component models. Those models can be written as
follow :

$$
y*{nt}=\alpha + \beta^\top x*{nt} + \epsilon*{nt} = \alpha + \beta^\top x*{nt} + \eta_n + \mu*t + \nu*{nt}
$$

where $n$ and $t$ are the individual and time indexes, $y$ the response, $x$ a vector of covariates, $\alpha$ the overall intercept and $\beta$ the vector of parameters of interest that we are willing to estimate. The error term is composed of three elements :

- $\eta_n$ is the individual effect,
- $\mu_t$ is the time effect,
- $\epsilon_{nt}$ is the idiosyncratic error.

# Basic use of `plm`

The first two arguments of `plm`

are, like for most of the estimation
functions of `R`

a `formula`

which describes the model to be estimated
and a `data.frame`

. `subset`

, `weights`

and `na.action`

are also
available and have the same behavior as in the `lm`

function. Three
more main arguments can be set :

`index`

helps`plm`

to understand the structure of the data : if`NULL`

, the first two columns of the data are assumed to contain the individual or the time index. Otherwise, supply the column names of the individual and time index as a character, e.g., use something like`c("firm", "year")`

or just`"firm"`

if there is no explicit time index.`effect`

indicates the effects that should be taken into account ; this is one of`"indiviudal"`

,`"time"`

and`"twoways"`

.`model`

indicates the model to be estimated :`"pooling"`

is just the OLS estimation (equivalent to a call to`lm`

),`"between"`

performs the estimation on the individual or time means,`"within"`

on the deviations from the individual or/and time mean,`"fd"`

on the first differences and`"random"`

perform a feasible generalized least squares estimation which takes into account the correlation induced by the presence of individual and/or time effects.

The estimation of all but the last model is straightforward, as it
requires only the estimation by *OLS* of obvious transformations of
the data. The *GLS* model requires more explanation. In most of the
cases, the estimation is obtained by quasi-differencing the data from
the individual and/or the time means. The coefficients used to perform
this quasi-difference depends on estimators of the variance of the
components of the error, namely $\sigma^2*\nu$, $\sigma^2*\eta$ in
case of individual effects and $\sigma^2_\mu$ in case of time effects.

The most common technique used to estimate these variance is to use the following result :

$$ \frac{\mbox{E}(\epsilon^\top W \epsilon)}{N(T-1)} = \sigma_\nu^2 $$

and

$$
\frac{\mbox{E}(\epsilon^\top B \epsilon)}{N} = T \sigma*\eta^2 + \sigma*\nu^2
$$

where $B$ and $W$ are respectively the matrices that performs the
individual (or time) means and the deviations from these
means. Consistent estimators can be obtained by replacing the unknown
errors by the residuals of a consistent preliminary estimation and by
dropping the expecting value operator. Some degree of freedom
correction can also be introduced. `plm`

calls the general function
`ercomp`

to estimate the variances. Important arguments to `ercomp`

are:

`models`

indicates which models are estimated in order to calculate the two quadratic forms ; for example`c("within", "Between")`

.`dfcor`

indicates what kind of degrees of freedom correction is used : if`0`

, the quadratic forms are divided by the number of observations, respectively $N\times T$ and $N$ ; if`1`

, the numerators of the previous expressions are used ($N\times (T-1)$ and $N$) ; if`2`

, the number of estimated parameters in the preliminary estimate $K$ is deduced. Finally, if`3`

, the unbiased version is computed, which is based on much more complex computations, which relies on the calculus of the trace of different cross-products which depends on the preliminary models used.`method`

is an alternative to the`models`

argument ; it is one of :`"walhus"`

- which is equivalent to`models = c("pooling")`

, @WALL:HUSS:69,`"swar"`

-`models = c("within", "Between")`

, @SWAM:AROR:72,`"amemiya"`

-`models = c("within")`

, @AMEM:71,`"ht"`

is an slightly modified version of`amemiya`

; when there are time-invariant covariates, the @AMEM:71 estimator of the individual component of the variance is under-estimated as the time-invariant covariates disappear in the within regression. In this case @HAUS:TAYL:81 proposed to regress the estimation of the individual effects on the time-invariant covariates and use the residuals in order to estimate the components of the variance,`"nerlove"`

, which is a specific method which doesn't fit to the methodology described above (@NERLO:71).

Note that when only one model is provided in `models`

, this means that the same
residuals are used to compute the two quadratic forms.

To illustrate the use of `plm`

, we use examples reproduced in @BALT:13.
Table 2.1 on page 21 presents EViews' results of the estimation on the
`Grunfeld`

data set :

```
library("plm")
data("Grunfeld", package = "plm")
ols <- plm(inv ~ value + capital, Grunfeld, model = "pooling")
between <- update(ols, model = "between")
within <- update(ols, model = "within")
walhus <- update(ols, model = "random", random.method = "walhus", random.dfcor = 3)
amemiya <- update(walhus, random.method = "amemiya")
swar <- update(amemiya, random.method = "swar")
```

Note that the `random.dfcor`

argument is set to `3`

, which means that
the unbiased version of the estimation of the error components is
used. We use the `texreg`

package to present the results :

```
library("texreg")
screenreg(list(ols = ols, between = between, within = within,
walhus = walhus, amemiya = amemiya, swar = swar),
digits = 5, omit.coef = "(Intercept)")
```

The estimated variance can be extracted using the `ercomp`

function. For
example, for the `amemiya`

model :

`ercomp(amemiya)`

@BALT:13, p. 27, presents the Stata estimation of the Swamy-Arora
estimator ; the Swamy-Arora estimator is the same if `random.dfcor`

is
set to `3`

or `2`

(the quadratic forms are divided by $\sum_n T_n - K - N$
and by $N - K - 1$), so I don't know what is the behaviour of Stata for the
other estimators for which the unbiased estimators differs from the
simple one.

```
data("Produc", package = "plm")
PrSwar <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, Produc,
model = "random", random.method = "swar", random.dfcor = 3)
summary(PrSwar)
```

# The twoways effect model

The two-ways effect model is obtained by setting the `effect`

argument
to `"twoways"`

. @BALT:13 p. 44-46, presents EViews' output for the Grunfeld
data set.

```
Grw <- plm(inv ~ value + capital, Grunfeld, model = "random", effect = "twoways",
random.method = "walhus", random.dfcor = 3)
Grs <- update(Grw, random.method = "swar")
Gra <- update(Grw, random.method = "amemiya")
screenreg(list("Wallace-Hussain" = Grw, "Swamy-Arora" = Grs, "Amemiya" = Gra), digits = 5)
```

The estimated variance of the time component is negative for the
Wallace-Hussain as well as the Swamy-Arora models and `plm`

sets it to 0.

@BALT:09 p. 60-62, presents EViews' output for the `Produc`

data.

```
data("Produc", package = "plm")
Prw <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, Produc,
model = "random", random.method = "walhus",
effect = "twoways", random.dfcor = 3)
Prs <- update(Prw, random.method = "swar")
Pra <- update(Prw, random.method = "amemiya")
screenreg(list("Wallace-Hussain" = Prw, "Swamy-Arora" = Prs, "Amemiya" = Pra), digits = 5)
```

# Unbalanced panels

Two difficulties arise with unbalanced panels :

- There are no obvious denominators for the quadratic forms of the residuals that are used to estimate the components of the variance. The strategy is then to compute the expected value and equate it to the actual quadratic forms. Detailed formula are omitted here, they depend on the preliminary estimator.
- For the one-way effect model, the estimator is still obtained by
applying
*OLS*on demeaned data (the individual**and**the time means are now deduced) for the within model and on quasi-demeaned data for the random effects model ; this is not the case for the two-ways effects model.

@BALT:13 and @BALT:09 present results of the estimation of the
@SWAM:AROR:72 model with the `Hedonic`

data set. @BALT:13, p. 174,
presents the Stata output as @BALT:09, p. 211 presents EViews'
output. EViews' Wallace-Hussain estimator is reported in @BALT:09,
p. 210.

```
data("Hedonic", package = "plm")
form <- mv ~ crim + zn + indus + chas + nox + rm +
age + dis + rad + tax + ptratio + blacks + lstat
HedStata <- plm(form, Hedonic, model = "random", index = "townid",
random.models = c("within", "between"))
HedEviews <- plm(form, Hedonic, model = "random", index = "townid",
random.models = c("within", "Between"))
HedEviewsWH <- update(HedEviews, random.models = "pooling")
screenreg(list(EViews = HedEviews, Stata = HedStata, "Wallace-Hussain" = HedEviewsWH),
digits = 5, single.row = TRUE)
```

The difference is due to the fact that Stata uses a between regression
on $N$ observations while EViews uses a between regression on $\sum_n
T_n$ observations, which are not the same on unbalanced panels. Note
the use of between with or without the B capitalized in the
`random.models`

argument. `plm`

's default is to use the between
regression with $\sum_n T_n$ observations when setting ```
model =
"random", random.method = "swar"
```

. The default employed is what the
original paper for the unbalanced one-way Swamy-Arora estimator
defined (in @BALT:CHAN:94, p. 73). A more detailed analysis of Stata's
Swamy-Arora estimation procedure is given by @COTT:2017.

# Instrumental variable estimators

All of the models presented above may be estimated using instrumental
variables (IV). The instruments are specified using two- or three-part
formulas, each part being separated by a `|`

sign :

- the first part contains the covariates,
- the second part contains the "double-exogenous" instruments,
*i.e.*variables that can be used twice as instruments, using their within and the between transformation, - the third part contains the "single-exogenous" instruments,
*i.e.*variables for which only the within transformation can be used as instruments, those variables being correlated with the individual effects.

The instrumental variables estimator used is indicated with the
`inst.method`

argument:

`"bvk"`

, from @BALE:VARA:87, the default value : in this case, all the instruments are introduced in quasi-differences, using the same transformation as for the response and the covariates,`"baltagi"`

, from @BALT:81, the instruments of the second part are introduced twice using the between and the within transformation as those of the third are only introduced with the within transformation,`"am"`

, from @AMEM:MACU:86, the within transformation of the variables of the second parts for each period are also included as instruments,`"bmc"`

, from @BREU:MIZO:SCHM:89, the within transformation of the variable of the second and of the third parts are included as instruments.

The various possible values of the `inst.method`

argument are not relevant
for fixed effect IV models as there is only one method for this type of IV
models but many for random effect IV models.

The instrumental variable estimators are illustrated in the following example from @BALT:2005, p. 120/ @BALT:13, p. 137.

```
data("Crime", package = "plm")
crbalt <- plm(lcrmrte ~ lprbarr + lpolpc + lprbconv + lprbpris + lavgsen +
ldensity + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed +
lwsta + lwloc + lpctymle + lpctmin + region + smsa + factor(year)
| . - lprbarr - lpolpc + ltaxpc + lmix,
data = Crime, model = "random", inst.method = "baltagi")
crbvk <- update(crbalt, inst.method = "bvk")
crwth <- update(crbalt, model = "within")
screenreg(list(FE2SLS = crwth, EC2SLS = crbalt, G2SLS = crbvk),
single.row = TRUE, digits = 5, omit.coef = "(region)|(year)",
reorder.coef = c(1:16, 19, 18, 17))
```

The Hausman-Taylor model (@HAUS:TAYL:81) may be estimated
with the `plm`

function by setting argument `random.method = "ht"`

.
The following example is from @BALT:2005, p. 130 and @BALT:13, p. 146.

```
data("Wages", package = "plm")
ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp^2) +
bluecol + ind + union + sex + black + ed |
bluecol + south + smsa + ind + sex + black |
wks + married + exp + I(exp^2) + union,
data = Wages, index = 595,
inst.method = "baltagi", model = "random",
random.method = "ht")
am <- update(ht, inst.method = "am")
screenreg(list("Hausman-Taylor" = ht, "Amemiya-MaCurdy" = am),
digits = 5, single.row = TRUE)
```

# Nested error component model

@BALT:SONG:JUNG:01

```
data("Produc", package = "plm")
swar <- plm(form <- log(gsp) ~ log(pc) + log(emp) + log(hwy) + log(water) + log(util) + unemp,
Produc, index = c("state", "year", "region"), effect = "nested", random.method = "swar")
walhus <- update(swar, random.method = "walhus")
amem <- update(swar, random.method = "amemiya")
screenreg(list("Swamy-Arora" = swar, "Wallace-Hussain" = walhus, "Amemiya" = amem), digits = 5)
```