felm: Fitting linear models with multiple group fixed effects

Description

'felm' is used to fit linear models with multiple group fixed effects, similarly to lm. It uses the Method of Alternating projections to sweep out multiple group effects from the normal equations before estimating the remaining coefficients with OLS.

This function is intended for use with large datasets with multiple group effects of large cardinality. If dummy-encoding the group effects results in a manageable number of coefficients, you are probably better off by using lm.

Usage

felm(formula, data, iv=NULL, clustervar=NULL, exactDOF=FALSE,
subset, na.action, contrasts=NULL)

Arguments

formula

an object of class '"formula"' (or one that can be coerced to that class: a symbolic description of the model to be fitted. Similarly to 'lm'. Grouping factors f are coded as G(f). Interactions between a covariate x and a

data

a data frame containing the variables of the model

a formula describing an instrumented variable. Estimated via two step OLS

clustervar

a string or factor. Either the name of a variable or a factor. Used for computing clustered standard errors.

exactDOF

logical. If more than two factors, the degrees of freedom used to scale the covariance matrix (and the standard errors) is normally estimated. Setting exactDOF=TRUE causes felm to attempt to compute it, bu

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The 'factory-fr

contrasts

an optional list. See the contrasts.arg of model.matrix.default

Value

felm returns an object of class "felm". It is quite similar to an "lm" object, but not entirely compatible.
The generic summary-method will yield a summary which may be print'ed. The object has some resemblance to the an lm object, and some postprocessing methods designed for lm may happen to work. It may however be necessary to coerce the object to succeed with this.
The "felm" object is a list containing the following fields:
coefficientsa numerical vector. The estimated coefficients.
Nan integer. The number of observations
pan integer. The total number of coefficients, including those projected out.
responsea numerical vector. The response vector.
fitted.valuesa numerical vector. The fitted values.
residualsa numerical vector. The residuals of the full system, with dummies.
r.residualsa numerical vector. Reduced residuals, i.e. the residuals resulting from predicting without the dummies.
cfactorfactor of length N. The factor describing the connected components of the two first G() terms in the model.
vcva matrix. The variance-covariance matrix.
felist of factors. A list of the G() terms in the model.
step1list of 'felm' objects for the IV 1. step(s), if used.

Details

The G() is not a function in itself, it is just syntax to distinguish the grouping factors. It does, however, translate to as.factor() inside felm(). For the G(x:f) syntax, the x must be numeric vector or matrix or factor, and as.factor() is applied to f. The entity inside G() is not treated as an ordinary formula, in particular it is not possible with things like G(x*f).

The standard errors are adjusted for the reduced degrees of freedom coming from the dummies which are implicitly present. In the case of two factors, the exact number of implicit dummies is easy to compute. If there are more factors, the number of dummies is estimated by assuming there's one reference-level for each factor, this may be a slight over-estimation, leading to slightly too large standard errors. Setting exactDOF computes the exact degrees of freedom with rankMatrix() in package Matrix. Note that version 1.1-0 og Matrix has a bug in rankMatrix() for sparse matrices which may cause cause it to return the wrong value. A fix is underway.

For the iv-argument, it is only necessary to include the instruments on the right hand side. The other covariates, from formula, are added automatically in the first step. See the examples. iv can also be a list of formulas if more than one variable is instrumented. However, all instruments should then be specified in all the formulas. A more consise syntax for multiple instruments will probably be implemented in the future.

The contrasts argument is similar to the one in lm(), it is used for the factors outside the G() terms. The factors inside the G() terms are analyzed as part of a possible subsequent getfe() call.

Ideally, the clustervar should have been an option to the summary-function instead. However, this would require keeping a copy of the model matrix in the returned structure. Since this function is intended for very large datasets, we discard the model matrix to save memory, keeping only residuals and other summary statistics.

Note that the syntax of the felm-function has changed, it does no longer allow a separate specification of the group factors, they must be specified with the G()-syntax. The old felm is still available as lfe:::felm.old, but it will no longer be maintained.

Examples

Run this code

## create covariates
x <- rnorm(1000)
x2 <- rnorm(length(x))

## individual and firm
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))

## effects for them
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))

## left hand side
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u

## estimate and print result
est <- felm(y ~ x+x2+G(id)+G(firm))
summary(est)
## compare with lm
summary(lm(y ~ x + x2 + id + firm-1))
## alternatively
felm(y ~ x + x2,fl=list(id=id,firm=firm))
  getfe(est)


# make a weird example with 'reverse causation'
# Q and W are instrumented by x3 and G(x4), report robust s.e.
x3 <- rnorm(length(x))
x4 <- sample(12,length(x),replace=TRUE)

Q <- 0.3*x3 + x + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(length(x),sd=0.3)
W <- 0.7*x3 - 2*x + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(length(x),sd=0.6)

# add them to the outcome
y <- y + Q + W

ivest <- felm(y ~ x + x2 + G(id)+G(firm) + Q + W, iv=list(Q ~ x3+G(x4), W ~x3+G(x4)))
summary(ivest,robust=TRUE)
# compare with the not instrumented fit:
summary(felm(y ~ x + x2 + G(id)+G(firm) + Q + W))