This package is intended for linear models with multiple group fixed effects.
It performs no other functions than lm
or package
lme4, but it uses a special method for projecting out multiple group fixed
effects from the normal equations, hence it is faster. It is a
generalization of the within-groups estimator. This may be required if
the groups have high cardinality (many levels), resulting in tens
or hundreds of thousands of dummy-variables. It is also useful if one
only wants to control for the group effects, without actually computing
them. The package is not able to compute standard errors for the group
effects.The estimation is done in two steps. First the other coefficients are
estimated with the function felm
by centering on all the
group means. Then the group effects are extracted (if needed)
with the function getfe
. There's also a function
demeanlist
which just does the centering on an arbitrary
matrix, and there's a function compfactor
which computes
the connection components (which is used for interpreting the group
effects when there are only two factors, see the Abowd et al
references), they are also returned by getfe
).
The centering on the means is done with a tolerance. This tolerance is
set by options(lfe.eps=1e-7)
, its default is
sqrt(.Machine$double.eps)
(which is
format(sqrt(.Machine$double.eps),digits=3)). This is a somewhat
conservative tolerance, in many cases I'd guess
options(lfe.eps=1e-4)
may be sufficient. This will speed up the
centering.
The package is threaded, that is, it may use more than one cpu. The
number of threads is fetched upon loading the package, from the
environment variable LFE_THREADS (or OMP_NUM_THREADS) and
stored by options(lfe.threads=n)
. This option may be changed prior to
calling felm
, if so desired.
Threading is only done for the centering; the extraction of the group
effects is not threaded, but it uses any threading in the underlying
blas-library (which is usually controlled by the OMP_NUM_THREADS
environment variable).
The package has been tested on datasets with approx 20,000,000
observations with approx 2,300,000 and 270,000 group levels (the
felm
takes 1-2 hours on 8 cpus, the getfe
takes a
couple of days). It uses the sparse Cholesky solver of package
Matrix, which relies heavily on the blas-library.
It's thus strongly recommended to link an optimized blas into R (such as
'goto', 'atlas', 'acml' or 'mkl').
The package will work with any positive number of grouping factors, but if
more than two, their interpretation is in general not well understood.