lm, but it uses a
special method for projecting out multiple group fixed effects from the
normal equations, hence it is faster. It is a generalization of the
within estimator. This may be required if the groups have high
cardinality (many levels), resulting in tens or hundreds of thousands of
dummy variables. It is also useful if one only wants to control for the
group effects, without actually estimating them. The package may
optionally compute standard errors for the group effects by
bootstrapping, but this is a very time- and memory-consuming process
compared to finding the point estimates. If you only have a single huge
factor, the package lm or other
packages are probably better suited. lm will do if run with the
full set of dummies.As of version 1.6, projecting out interactions between continuous covariates and factors is supported. I.e. individual slopes, not only individual intercepts. As of version 2.0, multiple left hand sides are supported.
The estimation is done in two steps. First the other coefficients are
estimated with the function felm by centering on all the
group means, followed by an OLS (similar to lm). Then the group effects
are extracted (if needed) with the function getfe. This
method is described by Gaure (2013), but also appears in Guimaraes and
Portugal (2010), disguised as the Gauss-Seidel algorithm.
There's also a function demeanlist which just does the
centering on an arbitrary matrix or data frame, and there's a function
compfactor which computes the connected components which
are used for interpreting the group effects when there are only two
factors (see the Abowd et al references), they are also returned by
getfe.
For those who study the correlation between the fixed effects, like in
Abowd et al. (1999), there are functions bccorr and
fevcov for computing limited mobility bias corrected
correlations and variances with the method described in Gaure (2014b).
Instrumental variable estimations are supported with 2SLS. Conditional
F statistics for testing reduced rank weak instruments as in Sanderson
and Windmeijer (2014) are available in condfstat. Joint
signficance testing of coefficients is available in waldtest.
The centering on the means is done with a tolerance which is
set by options(lfe.eps=1e-8) (the default). This is a somewhat
conservative tolerance, in many cases I'd guess
1e-6 may be sufficient. This may speed up the
centering. In the other direction, setting options(lfe.eps=0)
will provide maximum accuracy at the cost of computing time and
warnings about convergence failure.
The package is threaded, that is, it may use more than one cpu. The
number of threads is fetched upon loading the package from the
environment variable options(lfe.threads=n). This option can be changed prior to
calling felm, if so desired. Note that, typically,
Threading is only done for the centering; the extraction of the group
effects is not threaded. The default method for extracting the
group coefficients is the iterative Kaczmarz-method, its tolerance
is also the lfe.eps option.
For some datasets the Kaczmarz-method is converging very slowly, in this
case it may be replaced with a conjugate gradient method by setting the
option options(lfe.usecg=TRUE).
The package has been tested on datasets with approx 20,000,000
observations with 15 covariates and approx 2,300,000 and 270,000 group
levels (the felm took about 50 minutes on 8 cpus, the
getfe takes 5 minutes). Though, beware that not
only the size of the dataset matters, but also its structure, as
demonstrated by Gaure (2014a).
The package will work with any number of grouping factors, but if
more than two, their interpretation is in general not well understood,
i.e. one should make sure that the group coefficients are estimable.
A discussion of estimability, the algorithm used, and convergence rate
are available in vignettes, as well as in the published papers in the citation
list (citation('lfe')).
In the exec-directory there is a perl-script lfescript which
is used at the author's site for automated creation of R-scripts from
a simple specification file. The format is documented in
doc/lfeguide.txt.
a2reg and felsdvreg. The method is very
similar to the one in the Stata module reghdfe.
Andrews, M., L. Gill, T. Schank and R. Upward (2008)
High wage workers and low wage firms: negative assortative
matching or limited mobility bias?
J.R. Stat. Soc.(A) 171(3), 673--697.
Cornelissen, T. (2008)
The stata command felsdvreg to fit a linear model with two
high-dimensional fixed effects.
Stata Journal, 8(2):170--189, 2008.
Correia, S. (2014)
REGHDFE: Stata module to perform linear or instrumental-variable
regression absorbing any number of high-dimensional fixed effects,
Statistical Software Components, Boston College Department of Economics.
Croissant, Y. and G. Millo (2008)
Panel Data Econometrics in R: The plm Package,
Journal of Statistical Software, 27(2).
Gaure, S. (2013) OLS with Multiple High Dimensional Category
Variables. Computational Statistics and Data Analysis, 66:8--18, 2013
Gaure, S. (2014a) lfe: Linear Group Fixed Effects. The R
Journal, 5(2):104-117, Dec 2013.
Gaure, S. (2014b), Correlation bias correction in two-way
fixed-effects linear regression, Stat 3(1):379-390, 2014.
Guimaraes, P. and Portugal, P. (2010) A simple feasible
procedure to fit models with high-dimensional fixed effects.
The Stata Journal, 10(4):629--649, 2010.
Sanderson, E. and F. Windmeijer (2014)
A weak instrument F-test in linear iv models with multiple
endogenous variables, Disc. Paper 14/644 Univ of Bristol.
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(10,length(x),replace=TRUE))
firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
year.eff <- rnorm(nlevels(year))
y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
year.eff[year] + rnorm(length(x))
est <- felm(y ~ x+x2 | id + firm + year)
summary(est)
getfe(est,se=TRUE)
# compare with an ordinary lm
summary(lm(y ~ x+x2+id+firm+year-1))
options(oldopts)Run the code above in your browser using DataLab