# felm

##### Fit a linear model with multiple group fixed effects

'felm' is used to fit linear models with multiple group fixed effects, similarly to lm. It uses the Method of Alternating projections to sweep out multiple group effects from the normal equations before estimating the remaining coefficients with OLS.

##### Usage

```
felm(formula, data, exactDOF = FALSE, subset, na.action,
contrasts = NULL, weights = NULL, ...)
```

##### Arguments

- formula
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. Similarly to 'lm'. See Details.

- data
a data frame containing the variables of the model.

- exactDOF
logical. If more than two factors, the degrees of freedom used to scale the covariance matrix (and the standard errors) is normally estimated. Setting

`exactDOF=TRUE`

causes`felm`

to attempt to compute it, but this may fail if there are too many levels in the factors.`exactDOF='rM'`

will use the exact method in`Matrix::rankMatrix()`

, but this is slower. If neither of these methods works, it is possible to specify`exactDOF='mc'`

, which utilizes a Monte-Carlo method to estimate the expectation E(x' P x) = tr(P), the trace of a certain projection, a method which may be more accurate than the default guess.If the degrees of freedom for some reason are known, they can be specified like

`exactDOF=342772`

.- subset
an optional vector specifying a subset of observations to be used in the fitting process.

- na.action
a function which indicates what should happen when the data contain

`NA`

s. The default is set by the`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The 'factory-fresh' default is`na.omit`

. Another possible value is`NULL`

, no action.`na.exclude`

is currently not supported.- contrasts
an optional list. See the

`contrasts.arg`

of`model.matrix.default`

.- weights
an optional vector of weights to be used in the fitting process. Should be 'NULL' or a numeric vector. If non-NULL, weighted least squares is used with weights

`weights`

(that is, minimizing`sum(w*e^2)`

); otherwise ordinary least squares is used.- ...
other arguments.

`keepX`

logical. To include a copy of the expanded data matrix in the return value, as needed by`bccorr`

and`fevcov`

for proper limited mobility bias correction.`keepCX`

logical. Keep a copy of the centred expanded data matrix in the return value. As list elements`cX`

for the explanatory variables, and`cY`

for the outcome.`keepModel`

logical. Keep a copy of the model frame.`nostats`

logical. Don't include covariance matrices in the output, just the estimated coefficients and various descriptive information. For IV,`nostats`

can be a logical vector of length 2, with the last value being used for the 1st stages.`psdef`

logical. In case of multiway clustering, the method of Cameron, Gelbach and Miller may yield a non-definite variance matrix. Ordinarily this is forced to be semidefinite by setting negative eigenvalues to zero. Setting`psdef=FALSE`

will switch off this adjustment. Since the variance estimator is asymptotically correct, this should only have an effect when the clustering factors have very few levels.`kclass`

character. For use with instrumental variables. Use a k-class estimator rather than 2SLS/IV. Currently, the values`'nagar', 'b2sls', 'mb2sls', 'liml'`

are accepted, where the names are from Kolesar et al (2014), as well as a numeric value for the 'k' in k-class. With`kclass='liml'`

,`felm`

also accepts the argument`fuller=<numeric>`

, for using a Fuller adjustment of the liml-estimator.`Nboot, bootexpr, bootcluster`

Since`felm`

has quite a bit of overhead in the creation of the model matrix, if one wants confidence intervals for some function of the estimated parameters, it is possible to bootstrap internally in`felm`

. That is, the model matrix is resampled`Nboot`

times and estimated, and the`bootexpr`

is evaluated inside an`sapply`

. The estimated coefficients and the left hand side(s) are available by name. Any right hand side variable`x`

is available by the name`var.x`

. The`"felm"`

-object for each estimation is available as`est`

. If a`bootcluster`

is specified as a factor, entire levels are resampled.`bootcluster`

can also be a function with no arguments, it should return a vector of integers, the rows to use in the sample. It can also be the string 'model', in which case the cluster is taken from the model.`bootexpr`

should be an expression, e.g. like`quote(x/x2 * abs(x3)/mean(y))`

. It could be wise to specify`nostats=TRUE`

when bootstrapping, unless the covariance matrices are needed in the bootstrap. If you need the covariance matrices in the full estimate, but not in the bootstrap, you can specify it in an attribute`"boot"`

as`nostats=structure(FALSE, boot=TRUE)`

.`iv, clustervar`

deprecated. These arguments will be removed at a later time, but are still supported in this field. Users are*STRONGLY*encouraged to use multipart formulas instead. In particular, not all functionality is supported with the deprecated syntax; iv-estimations actually run a lot faster if multipart formulas are used, due to new algorithms which I didn't bother to shoehorn in place for the deprecated syntax.

##### Details

This function is intended for use with large datasets with multiple group
effects of large cardinality. If dummy-encoding the group effects results
in a manageable number of coefficients, you are probably better off by using
`lm`

.

The formula specification is a response variable followed by a four part
formula. The first part consists of ordinary covariates, the second part
consists of factors to be projected out. The third part is an
IV-specification. The fourth part is a cluster specification for the
standard errors. I.e. something like ```
y ~ x1 + x2 | f1 + f2 | (Q|W ~
x3+x4) | clu1 + clu2
```

where `y`

is the response, `x1,x2`

are
ordinary covariates, `f1,f2`

are factors to be projected out, `Q`

and `W`

are covariates which are instrumented by `x3`

and
`x4`

, and `clu1,clu2`

are factors to be used for computing cluster
robust standard errors. Parts that are not used should be specified as
`0`

, except if it's at the end of the formula, where they can be
omitted. The parentheses are needed in the third part since `|`

has
higher precedence than `~`

. Multiple left hand sides like ```
y|w|x ~
x1 + x2 |f1+f2|...
```

are allowed.

Interactions between a covariate `x`

and a factor `f`

can be
projected out with the syntax `x:f`

. The terms in the second and
fourth parts are not treated as ordinary formulas, in particular it is not
possible with things like `y ~ x1 | x*f`

, rather one would specify
`y ~ x1 + x | x:f + f`

. Note that `f:x`

also works, since R's
parser does not keep the order. This means that in interactions, the factor
*must* be a factor, whereas a non-interacted factor will be coerced to
a factor. I.e. in `y ~ x1 | x:f1 + f2`

, the `f1`

must be a factor,
whereas it will work as expected if `f2`

is an integer vector.

In older versions of lfe the syntax was ```
felm(y ~ x1 + x2 + G(f1)
+ G(f2), iv=list(Q ~ x3+x4, W ~ x3+x4), clustervar=c('clu1','clu2'))
```

. This
syntax still works, but yields a warning. Users are *strongly*
encouraged to change to the new multipart formula syntax. The old syntax
will be removed at a later time.

The standard errors are adjusted for the reduced degrees of freedom coming
from the dummies which are implicitly present. In the case of two factors,
the exact number of implicit dummies is easy to compute. If there are more
factors, the number of dummies is estimated by assuming there's one
reference-level for each factor, this may be a slight over-estimation,
leading to slightly too large standard errors. Setting `exactDOF='rM'`

computes the exact degrees of freedom with `rankMatrix()`

in package
Matrix.

For the iv-part of the formula, it is only necessary to include the
instruments on the right hand side. The other explanatory covariates, from
the first and second part of `formula`

, are added automatically in the
first stage regression. See the examples.

The `contrasts`

argument is similar to the one in `lm()`

, it is
used for factors in the first part of the formula. The factors in the second
part are analyzed as part of a possible subsequent `getfe()`

call.

The old syntax with a single part formula with the `G()`

syntax for the
factors to transform away is still supported, as well as the
`clustervar`

and `iv`

arguments, but users are encouraged to move
to the new multi part formulas as described here. The `clustervar`

and
`iv`

arguments have been moved to the `...`

argument list. They
will be removed in some future update.

##### Value

`felm`

returns an object of `class`

`"felm"`

. It is
quite similar to an `"lm"`

object, but not entirely compatible.

The generic `summary`

-method will yield a summary which may be
`print`

'ed. The object has some resemblance to an `'lm'`

object,
and some postprocessing methods designed for `lm`

may happen to work.
It may however be necessary to coerce the object to succeed with this.

The `"felm"`

object is a list containing the following fields:

a numerical vector. The estimated coefficients.

an integer. The number of observations

an integer. The total number of coefficients, including those projected out.

a numerical vector. The response vector.

a numerical vector. The fitted values.

a numerical vector. The residuals of the full system, with dummies. For IV-estimations, this is the residuals when the original endogenous variables are used, not their predictions from the 1st stage.

a numerical vector. Reduced residuals, i.e. the residuals
resulting from predicting *without* the dummies.

numerical vector. When using instrumental variables, residuals from 2. stage, i.e. when predicting with the predicted endogenous variables from the 1st stage.

numeric. The square root of the argument `weights`

.

factor of length N. The factor describing the connected components of the two first terms in the second part of the model formula.

a matrix. The variance-covariance matrix.

list of factors. A list of the terms in the second part of the model formula.

The '`felm`

' objects for the IV 1st stage, if used. The
1st stage has multiple left hand sides if there are more than one
instrumented variable.

list of numerical vectors. For IV 1st stage, F-value for excluded instruments, the number of parameters in restricted model and in the unrestricted model.

matrix. The expanded data matrix, i.e. from the first part of the
formula. To save memory with large datasets, it is only included if
`felm(keepX=TRUE)`

is specified. Must be included if
`bccorr`

or `fevcov`

is to be used for correcting
limited mobility bias.

matrix. The centred expanded data matrix. Only included if
`felm(keepCX=TRUE)`

.

The result of a `replicate`

applied to the `bootexpr`

(if used).

##### Note

Side effect: If `data`

is an object of class `"pdata.frame"`

(from
the plm package), the plm namespace is loaded if available, and
`data`

is coerced to a `"data.frame"`

with `as.data.frame`

which dispatches to a plm method. This ensures that transformations
like `diff`

and `lag`

from plm works as expected, but it
also incurs an additional copy of the `data`

, and the plm
namespace remains loaded after `felm`

returns. When working with
`"pdata.frame"`

s, this is what is usually wanted anyway.

For technical reasons, when running IV-estimations, the data frame supplied
in the `data`

argument to `felm`

, should *not* contain
variables with names ending in `'(fit)'`

. Variables with such names
are used internally by `felm`

, and may then accidentally be looked up
in the data frame instead of the local environment where they are defined.

##### References

Cameron, A.C., J.B. Gelbach and D.L. Miller (2011) Robust inference with multiway clustering, Journal of Business & Economic Statistics 29 (2011), no. 2, 238--249. http://dx.doi.org/10.1198/jbes.2010.07136

Kolesar, M., R. Chetty, J. Friedman, E. Glaeser, and G.W. Imbens (2014) Identification and Inference with Many Invalid Instruments, Journal of Business & Economic Statistics (to appear). http://dx.doi.org/10.1080/07350015.2014.978175

##### See Also

##### Examples

```
# NOT RUN {
oldopts <- options(lfe.threads=1)
## create covariates
x <- rnorm(1000)
x2 <- rnorm(length(x))
## individual and firm
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))
## effects for them
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
## left hand side
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
## estimate and print result
est <- felm(y ~ x+x2| id + firm)
summary(est)
# }
# NOT RUN {
## compare with lm
summary(lm(y ~ x + x2 + id + firm-1))
# }
# NOT RUN {
# make an example with 'reverse causation'
# Q and W are instrumented by x3 and the factor x4. Report robust s.e.
x3 <- rnorm(length(x))
x4 <- sample(12,length(x),replace=TRUE)
Q <- 0.3*x3 + x + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(length(x),sd=0.3)
W <- 0.7*x3 - 2*x + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(length(x),sd=0.6)
# add them to the outcome
y <- y + Q + W
ivest <- felm(y ~ x + x2 | id+firm | (Q|W ~x3+factor(x4)))
summary(ivest,robust=TRUE)
condfstat(ivest)
# }
# NOT RUN {
# compare with the not instrumented fit:
summary(felm(y ~ x + x2 +Q + W |id+firm))
# }
# NOT RUN {
options(oldopts)
# }
```

*Documentation reproduced from package lfe, version 2.8-2, License: Artistic-2.0*