# Model components for fitted models with plm

plm tries to follow as close as possible the way models are fitted
using `lm`

. This relies on the following steps, using the
`formula`

-`data`

:

- compute internally the
`model.frame`

by getting the relevant arguments (`formula`

,`data`

,`subset`

,`weights`

,`na.action`

and`offset`

) and the supplementary argument, - extract from the
`model.frame`

the response (with`model.response`

) and the model matrix (with`model.matrix`

), - call the estimation function
`plm.fit`

with`X`

and`y`

as arguments.

with some modifications.

Firstly, panel data has a special structure which is describe by an
`index`

argument. This can be used in the `pdata.frame`

function which
returns a `pdata.frame`

object which can be used as the `data`

argument of `plm`

. If the `data`

argument of `plm`

is an ordinary
`data.frame`

, the `index`

argument can also be used as an argument of
`plm`

. In this case, the `pdata.frame`

function is used internally to
transform the data.

Next, the `formula`

, which is the first and mandatory argument of
`plm`

is coerced to a `Formula`

object.

`model.frame`

is then called, but with the `data`

argument in the
first position (a `pdata.frame`

object) and the `formula`

in the
second position. This unusual order of the arguments enables to use a
specific `model.frame.pdata.frame`

method defined in `plm`

.

As for the `model.frame.formula`

method, a `data.frame`

is returned,
with a `terms`

attribute.

Next, the `X`

matrix is extracted using `model.matrix`

. The usual way
to do so is to feed the function with to arguments, a `formula`

or a
`terms`

object and a `data.frame`

created with `model.frame`

. `lm`

use
something like `model.matrix(terms(mf), mf)`

where `mf`

is a
`data.frame`

created with a `model.frame`

. Therefore, `model.matrix`

needs actually one argument and not two and we therefore wrote a
`model.matrix.pdata.frame`

which does the job ; the method first check
that the argument has a `term`

attribute, extract the `terms`

(actually the `formula`

) and then compute the `model.matrix`

.

The response `y`

is usually extracted using `model.response`

, with a
`data.frame`

created with `model.frame`

as first argument, but it is
not generic. We therefore create a generic called `pmodel.response`

and provide a `pmodel.response.pdata.frame`

method. We illustrate
these features using a simplified (in terms of covariates) example
with the `SeatBelt`

data set:

```
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)
```

We start with an OLS (`pooling`

) specification:

```
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))
```

which is equivalent to:

`coef(plm(formols, SeatBelt, model = "pooling"))`

Next we use an instrumental variables specification. `usage`

is
endogenous and instrumented by three variables indicating the law
context: `ds`

, `dp`

and `dsp`

.

The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used:

```
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
```

The second formulation has two advantages:

- in the common case when a lot of covariates are instruments, these covariates don't need to be indicated in the second RHS part of the formula,
- the endogenous variables clearly appear as they are proceeded by a
`-`

sign in the second RHS part of the formula.

The formula is coerced to a `Formula`

, using the `Formula`

package. `model.matrix.pdata.frame`

then internally calls
`model.matrix.Formula`

in order to extract the covariates and
instruments model matrices:

```
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)
```

For the second (and preferred formulation), the `dot`

argument should
be set and is passed to the `Formula`

methods. `.`

has actually two
meanings:

- all available covariates,
- the previous covariates used while updating a formula.

which correspond respectively to `dot = "seperate"`

(the default) and
`dot = "previous"`

. See the difference between the following two examples:

```
library("Formula")
head(model.frame(Formula(formiv2), SeatBelt), 3)
head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)
```

In the first case, all the covariates are returned by `model.frame`

as
the `.`

is understood by default as "everything".

In `plm`

, the `dot`

argument is internally set to
`previous`

so that the end-user doesn't have to worry about these
subtleties.

```
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)
```

The IV estimator can then be obtained as a 2SLS estimator: first regress the covariates on the instruments and get the fitted values:

```
HX1 <- lm.fit(W1, X1)$fitted.values
head(HX1, 3)
```

Next regress the response on these fitted values:

`coef(lm.fit(HX1, y))`

Or using the `formula`

-`data`

interface with the `ivreg`

function:

`coef(AER::ivreg(formiv1, data = SeatBelt))`

or `plm`

:

`coef(plm(formiv1, SeatBelt, model = "pooling"))`

```
X2 <- model.matrix(Formula(form1), mfSB, rhs = 2, dot = "previous")
formols <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
form1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) +
log(precentb) + log(precenth) + log(densrur) + log(densurb) +
log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) +
log(fueltax) + lim65 + lim70p + mlda21 + bac08
form2 <- . ~ . | . - log(usage) + ds + dp +dsp
jorm1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) +
log(precentb) + log(precenth) + log(densrur) + log(densurb) +
log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) +
log(fueltax) + lim65 + lim70p + mlda21 + bac08 | . - log(usage) +
ds + dp + dsp
jorm2 <- noccfat ~ . | .
```