
fHDbetween
is a generalization of fbetween
to efficiently predict with multiple factors and linear models (i.e. predict with vectors/factors, matrices, or data.frames/lists where the latter may contain multiple factor variables). Similarly fHDwithin
is a generalization of fwithin
to center on multiple factors and partial-out linear models.
The corresponding operators HDB
and HDW
also exist and additionally allow to predict / partial out full lm()
formulas with interactions between variables.
fHDbetween(x, …)
fHDwithin(x, …)
HDB(x, …)
HDW(x, …)
# S3 method for default
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
HDB(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
HDW(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
HDB(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, stub = "HDB.", …)
# S3 method for matrix
HDW(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, stub = "HDW.", …)
# S3 method for data.frame
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE,
variable.wise = FALSE, …)
# S3 method for data.frame
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE,
variable.wise = FALSE, …)
# S3 method for data.frame
HDB(x, fl, w = NULL, cols = is.numeric, na.rm = TRUE, fill = FALSE,
variable.wise = FALSE, stub = "HDB.", …)
# S3 method for data.frame
HDW(x, fl, w = NULL, cols = is.numeric, na.rm = TRUE, fill = FALSE,
variable.wise = FALSE, stub = "HDW.", …)# Methods for compatibility with plm:
# S3 method for pseries
fHDbetween(x, w = NULL, na.rm = TRUE, fill = TRUE, …)
# S3 method for pseries
fHDwithin(x, w = NULL, na.rm = TRUE, fill = TRUE, …)
# S3 method for pseries
HDB(x, w = NULL, na.rm = TRUE, fill = TRUE, …)
# S3 method for pseries
HDW(x, w = NULL, na.rm = TRUE, fill = TRUE, …)
# S3 method for pdata.frame
fHDbetween(x, w = NULL, na.rm = TRUE, fill = TRUE,
variable.wise = TRUE, …)
# S3 method for pdata.frame
fHDwithin(x, w = NULL, na.rm = TRUE, fill = TRUE,
variable.wise = TRUE, …)
# S3 method for pdata.frame
HDB(x, w = NULL, cols = is.numeric, na.rm = TRUE, fill = TRUE,
variable.wise = TRUE, stub = "HDB.", …)
# S3 method for pdata.frame
HDW(x, w = NULL, cols = is.numeric, na.rm = TRUE, fill = TRUE,
variable.wise = TRUE, stub = "HDW.", …)
a numeric vector, matrix, data.frame, panel-series (plm::pseries
) or panel-data.frame (plm::pdata.frame
).
a numeric vector, factor, matrix, data.frame or list (which may or may not contain factors). In the data.frame method fl
can also be a one-or two sided lm()
formula with variables contained in x
. Interactions (:)
and full interactions (*)
are supported! See Examples.
a vector of (non-negative) weights. Currently only weighted centering on multiple factors is supported, not weighted linear models.
data.frame methods: Select columns to center (partial-out) or predict using column names, indices or a function. Unless specified otherwise all numeric columns are selected. If NULL
, all variables are selected.
remove missing values from both x
and fl
. by default rows with missing values in x
or fl
are removed. In that case an attribute "na.rm" is attached containing the rows removed.
If na.rm = TRUE
, fill = TRUE
will not remove rows with missing values in x
or fl
, but fill them with NA
's.
data.frame methods: Setting variable.wise = TRUE
will process each column individually i.e. use all non-missing cases in each column and in fl
(fl
is only checked for missing values if na.rm = TRUE
). This is a lot less efficient but uses all data available in each column.
a prefix / stub to rename all transformed columns. FALSE
will not rename columns.
further arguments passed to lfe::demeanlist
(if fl
contains factors), or to / from other methods.
HDB
returns fitted values of regressing x
on fl
. HDW
returns residuals. See Details and Examples.
fHDbetween/HDB
and fHDwithin/HDW
can be understood as generalizations of lfe::demeanlist
to continuous-data and formula input, and more choices dealing with missing values. They are powerful tools for complex high-dimensional linear prediction problems involving large factors and datasets, but can just as well handle ordinary regression problems. Intended areas of use are to efficiently obtain residuals and predicted values from data, and to prepare data for complex linear models involving multiple levels of fixed effects. Such models can now be fitted using lm()
on data prepared with fHDwithin / HDW
(relying on bootstrapped SE's for inference, or implementing the appropriate corrections). See Examples.
If fl
is a vector or matrix, the result are identical to lm
i.e. fHDbetween / HDB
returns fitted(lm(x ~ fl))
and fHDwithin / HDW
residuals(lm(x ~ fl))
. If fl
is a list containing factors, all variables in x
and non-factor variables in fl
are centered on these factors using the method of alternating projections implemented by lfe::demeanlist
. Afterwards the centered data is regressed on the centered predictors. If fl
is just a list of factors, fHDwithin/HDW
returns the centered data and fHDbetween/HDB
the corresponding means. Take as a most general example a list fl = list(fct1, fct2, ..., var1, var2, ...)
where fcti
are factors and vari
are continuous variables. The output of fHDwithin/HDW | fHDbetween/HDB
will then be identical to calling resid | fitted
on lm(x ~ fct1 + fct2 + ... + var1 + var2 + ...)
. The computations performed by fHDwithin/HDW
and fHDbetween/HDB
are however much faster and more memory efficient than lm
because factors are not passed to stats::model.matrix
and expanded to matrices of dummies but projected out using lfe::demeanlist
.
The formula interface to the data.frame method (only supported by the operators HDW | HDB
) provides ease of use and allows for additional modelling complexity. For example it is possible to project out formulas like HDW(data, ~ fct1*var1 + fct2:fct3 + var2:fct2:fct3 + var1:var2:var3 + poly(var5,3)*fct5)
containing simple (:)
or full (*)
interactions of factors with continuous variables or polynomials of continuous variables, and two-or three-way interactions of factors and continuous variables. If the formula is one-sided as in the example above (the space left of (~)
is left empty), the formula is applied to all variables selected through cols
. The specification provided in cols
(default: all numeric variables not used in the formula) can be overridden by supplying one-or more dependent variables. For example HDW(data, var1 + var2 ~ fct1 + fct2)
will return a data.frame with var1
and var2
centered on fct1
and fct2
.
The special methods for plm::pseries
and plm::pdata.frame
center a panel-series or variables in a panel-data.frame on all panel-identifiers. By default in these methods fill = TRUE
and variable.wise = TRUE
, so missing values are kept. This change in the default arguments was done to ensure a coherent framework of functions and operators applied to plm panel-data classes.
fbetween/B and fwithin/W
, fscale/STD
, TRA
, fFtest
, Data Transformations, Collapse Overview
# NOT RUN {
HDW(mtcars$mpg, mtcars$carb) # Simple regression problems..
HDW(mtcars$mpg, mtcars[-1])
HDW(mtcars$mpg, qM(mtcars[-1]))
HDW(qM(mtcars[3:4]), mtcars[1:2])
HDW(iris[1:2], iris[3:4]) # Partialling columns 3 and 4 out of colums 1 and 2
HDW(iris[1:2], iris[3:5]) # Adding the Species factor -> fixed effect
HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year)) # Partialling out 2 fixed effects (iso3c is factor)
HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year), variable.wise = TRUE) # Variable-wise computations
HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year) + ODA) # Adding ODA as a continuouus regressor
HDW(wlddev, PCGDP + LIFEEX ~ iso3c:qF(decade) + qF(year) + ODA) # Country-decade and year FE's
# More complex examples (Currently only recommended for smaller data)
lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ factor(cyl)*carb + vs + wt:gear + wt:gear:carb))
lm(mpg ~ hp + factor(cyl)*carb + vs + wt:gear + wt:gear:carb, data = mtcars)
lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ factor(cyl)*carb + vs + wt:gear))
lm(mpg ~ hp + factor(cyl)*carb + vs + wt:gear, data = mtcars)
lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ cyl*carb + vs + wt:gear))
lm(mpg ~ hp + cyl*carb + vs + wt:gear, data = mtcars)
lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, mpg + hp ~ cyl*carb + factor(cyl)*poly(drat,2)))
lm(mpg ~ hp + cyl*carb + factor(cyl)*poly(drat,2), data = mtcars)
# }
Run the code above in your browser using DataLab