lm,
but it uses a special method for projecting out multiple group fixed
effects from the normal equations, hence it is faster. It is a
generalization of the within estimator. This may be required if
the groups have high cardinality (many levels), resulting in tens
or hundreds of thousands of dummy-variables. It is also useful if one
only wants to control for the group effects, without actually computing
them. The package may optionally compute standard errors for the group
effects, but this is a very time- and memory-consuming process compared
to finding the point estimates.The estimation is done in two steps. First the other coefficients are
estimated with the function felm by centering on all the
group means, followed by an OLS (similar to lm). Then the group effects
are extracted (if needed) with the function getfe. This
method is described in Gaure (2011).
There's also a function demeanlist which just does the
centering on an arbitrary matrix, and there's a function
compfactor which computes the connected components (which
are used for interpreting the group effects when there are only two
factors, see the Abowd et al references), they are also returned by
getfe).
The centering on the means is done with a tolerance which is
set by options(lfe.eps=1e-8) (the default). This is a somewhat
conservative tolerance, in many cases I'd guess
1e-6 may be sufficient. This may speed up the
centering. In the other direction, setting options(lfe.eps=0)
will provide maximum accuracy at the cost of computing time and
warnings about convergence failure.
The package is threaded, that is, it may use more than one cpu. The
number of threads is fetched upon loading the package, from the
environment variable options(lfe.threads=n). This option may be changed prior to
calling felm, if so desired. Note that, typically,
Threading is only done for the centering; the extraction of the group
effects is not threaded. The default method for extracting the
group coefficients is the iterative Kaczmarz-method, its tolerance
is also the lfe.eps option.
The package has been tested on datasets with approx 20,000,000
observations with 15 covariates and approx 2,300,000 and 270,000 group
levels (the felm took about 50 minutes on 8 cpus, the
getfe takes 5 minutes). Though, beware that not
only the size of the dataset matters, but also its structure.
The package will work with any positive number of grouping factors, but if more than two, their interpretation is in general not well understood, i.e. one should make sure that the coefficients are estimable.
In the exec-directory there is a perl-script lfescript which
is used at the authors site for creating R-scripts from
a simple specification file. The format is documented in
doc/lfeguide.txt.
a2reg and felsdvreg.
Andrews, M., L. Gill, T. Schank and R. Upward (2008)
High wage workers and low wage firms: negative assortative
matching or limited mobility bias?
J.R. Stat. Soc.(A) 171(3), 673--697.
Cornelissen, T. (2008)
The stata command felsdvreg to fit a linear model with two
high-dimensional fixed effects.
Stata Journal, 8(2):170--189, 2008.
Gaure, S. (2011) OLS with Multiple High Dimensional Category Variables (to appear)
Ouazad, A. (2008)
A2REG: Stata module to estimate models with two fixed effects.
Statistical Software Components S456942, Boston College Department of Economics.
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(10,length(x),replace=TRUE))
firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
year.eff <- rnorm(nlevels(year))
y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
year.eff[year] + rnorm(length(x))
est <- felm(y ~ x+x2+G(id)+G(firm)+G(year))
summary(est)
getfe(est,se=TRUE)
# compare with an ordinary lm
summary(lm(y ~ x+x2+id+firm+year-1))Run the code above in your browser using DataLab