fbetween
and fwithin
are S3 generics to efficiently obtain between-transformed (averaged) or (quasi-)within-transformed (demeaned) data. These operations can be performed groupwise and/or weighted. B
and W
are wrappers around fbetween
and fwithin
representing the 'between-operator' and the 'within-operator'.
(B
/ W
provide more flexibility than fbetween
/ fwithin
when applied to data frames (i.e. column subsetting, formula input, auto-renaming and id-variable-preservation capabilities…), but are otherwise identical.)
fbetween(x, …)
fwithin(x, …)
B(x, …)
W(x, …)# S3 method for default
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for default
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
W(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for matrix
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for matrix
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, stub = "B.", …)
# S3 method for matrix
W(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, stub = "W.", …)
# S3 method for data.frame
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for data.frame
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for data.frame
B(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
fill = FALSE, stub = "B.", keep.by = TRUE, keep.w = TRUE, …)
# S3 method for data.frame
W(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
mean = 0, theta = 1, stub = "W.", keep.by = TRUE, keep.w = TRUE, …)
# Methods for compatibility with plm:
# S3 method for pseries
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for pseries
B(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
W(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for pdata.frame
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pdata.frame
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, mean = 0, theta = 1, …)
# S3 method for pdata.frame
B(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
fill = FALSE, stub = "B.", keep.ids = TRUE, keep.w = TRUE, …)
# S3 method for pdata.frame
W(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
mean = 0, theta = 1, stub = "W.", keep.ids = TRUE, keep.w = TRUE, …)
# Methods for grouped data frame / compatibility with dplyr:
# S3 method for grouped_df
fbetween(x, w = NULL, na.rm = TRUE, fill = FALSE,
keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
fwithin(x, w = NULL, na.rm = TRUE, mean = 0, theta = 1,
keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
B(x, w = NULL, na.rm = TRUE, fill = FALSE,
stub = "B.", keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
W(x, w = NULL, na.rm = TRUE, mean = 0, theta = 1,
stub = "W.", keep.group_vars = TRUE, keep.w = TRUE, …)
a numeric vector, matrix, data frame, panel series (class pseries
of package plm
), panel data frame (plm::pdata.frame
) or grouped data frame (class 'grouped_df').
B and W data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1
or var1 + var2 ~ group1 + group2
. See Examples.
a numeric vector of (non-negative) weights. B
/W
data frame and pdata.frame
methods also allow a one-sided formula i.e. ~ weightcol
. The grouped_df
(dplyr) method supports lazy-evaluation. See Examples.
data.frame method: Select columns to center/average using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols
is ignored if a two-sided formula is passed to by
.
logical. Skip missing values in x
and w
when computing averages. If na.rm = FALSE
and a NA
or NaN
is encountered, the average for that group will be NA
, and all data points belonging to that group in the output vector will also be NA
.
plm methods: Select which panel identifier should be used as grouping variable. 1L takes the first variable in the plm::index
, 2L the second etc. Index variables can also be called by name using a character string. If more than one variable is supplied, the corresponding index-factors are interacted.
a prefix or stub to rename all transformed columns. FALSE
will not rename columns.
option to fbetween
/B
: Logical. TRUE
will overwrite missing values in x
with the respective average. By default missing values in x
are preserved.
option to fwithin
/W
: The mean to center on, default is 0, but a different mean can be supplied and will be added to the data after the centering is performed. A special option when performing grouped centering is mean = "overall.mean"
. In that case the overall mean of the data will be added after subtracting out group means.
option to fwithin
/W
: Double. An optional scalar parameter for quasi-demeaning i.e. x - theta * xi.
. This is useful for variance components ('random-effects') estimators. see Details.
B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain grouping / panel-identifier columns in the output. For data frames this only works if grouping variables were passed in a formula.
B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain column containing the weights in the output. Only works if w
is passed as formula / lazy-expression.
arguments to be passed to or from other methods.
fbetween
/B
returns x
with every element replaced by its (groupwise) mean (xi.
). Missing values are preserved if fill = FALSE
(the default). fwithin/W
returns x
where every element was subtracted its (groupwise) mean (x - theta * xi. + mean
or, if mean = "overall.mean"
, x - theta * xi. + theta * x..
). See Details.
Without groups, fbetween
/B
replaces all data points in x
with their mean or weighted mean (if w
is supplied). Similarly fwithin/W
subtracts the (weighted) mean from all data points i.e. centers the data on the mean.
With groups supplied to g
, the replacement / centering performed by fbetween/B
| fwithin/W
becomes groupwise. In terms of panel data notation: If x
is a vector in such a panel dataset, xit
denotes a single data-point belonging to group i
in time-period t
(t
need not be a time-period). Then xi.
denotes x
, averaged over t
. fbetween
/B
now returns xi.
and fwithin
/W
returns x - xi.
. Thus for any data x
and any grouping vector g
: B(x,g) + W(x,g) = xi. + x - xi. = x
. In terms of variance, fbetween/B
only retains the variance between group averages, while fwithin
/W
, by subtracting out group means, only retains the variance within those groups.
The data replacement performed by fbetween
/B
can keep (default) or overwrite missing values (option fill = TRUE
) in x
. fwithin/W
can center data simply (default), or add back a mean after centering (option mean = value
), or add the overall mean in groupwise computations (option mean = "overall.mean"
). Let x..
denote the overall mean of x
, then fwithin
/W
with mean = "overall.mean"
returns x - xi. + x..
instead of x - xi.
. This is useful to get rid of group-differences but preserve the overall level of the data. In regression analysis, centering with mean = "overall.mean"
will only change the constant term. See Examples.
If theta != 1
, fwithin
/W
performs quasi-demeaning x - theta * xi.
. If mean = "overall.mean"
, x - theta * xi. + theta * x..
is returned, so that the mean of the partially demeaned data is still equal to the overall data mean x..
. A numeric value passed to mean
will simply be added back to the quasi-demeaned data i.e. x - theta * xi. + mean
.
Now in the case of a linear panel model
Mundlak, Yair. 1978. On the Pooling of Time Series and Cross Section Data. Econometrica 46 (1): 69-85.
fhdbetween/HDB and fhdwithin/HDW
, fscale/STD
, TRA
, Data Transformations, Collapse Overview
# NOT RUN {
## Simple centering and averaging
head(fbetween(mtcars))
head(B(mtcars))
head(fwithin(mtcars))
head(W(mtcars))
all.equal(fbetween(mtcars) + fwithin(mtcars), mtcars)
## Groupwise centering and averaging
head(fbetween(mtcars, mtcars$cyl))
head(fwithin(mtcars, mtcars$cyl))
all.equal(fbetween(mtcars, mtcars$cyl) + fwithin(mtcars, mtcars$cyl), mtcars)
head(W(wlddev, ~ iso3c, cols = 9:13)) # Center the 5 series in this dataset by country
head(cbind(get_vars(wlddev,"iso3c"), # Same thing done manually using fwithin..
add_stub(fwithin(get_vars(wlddev,9:13), wlddev$iso3c), "W.")))
## Using B() and W() for fixed-effects regressions:
# Several ways of running the same regression with cyl-fixed effects
lm(W(mpg,cyl) ~ W(carb,cyl), data = mtcars) # Centering each individually
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE)) # Centering the entire data
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE, # Here only the intercept changes
mean = "overall.mean"))
lm(mpg ~ carb + B(carb,cyl), data = mtcars) # Procedure suggested by
# ..Mundlak (1978) - partialling out group averages amounts to the same as demeaning the data
# }
# NOT RUN {
<!-- % No code relying on suggested package -->
plm::plm(mpg ~ carb, mtcars, index = "cyl", model = "within") # "Proof"..
# }
# NOT RUN {
# This takes the interaction of cyl, vs and am as fixed effects
lm(W(mpg,list(cyl,vs,am)) ~ W(carb,list(cyl,vs,am)), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, stub = FALSE))
lm(mpg ~ carb + B(carb,list(cyl,vs,am)), data = mtcars)
# Now with cyl fixed effects weighted by hp:
lm(W(mpg,cyl,hp) ~ W(carb,cyl,hp), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl, ~ hp, stub = FALSE))
lm(mpg ~ carb + B(carb,cyl,hp), data = mtcars) # WRONG ! Gives a different coefficient!!
## Manual variance components (random-effects) estimation
res <- HDW(mtcars, mpg ~ carb)[[1]] # Get residuals from pooled OLS
sig2_u <- fvar(res)
sig2_e <- fvar(fwithin(res, mtcars$cyl))
T <- length(res) / fndistinct(mtcars$cyl)
sig2_alpha <- sig2_u - sig2_e
theta <- 1 - sqrt(sig2_alpha) / sqrt(sig2_alpha + T * sig2_e)
lm(mpg ~ carb, data = W(mtcars, ~ cyl, theta = theta, mean = "overall.mean", stub = FALSE))
# }
# NOT RUN {
<!-- % No code relying on suggested package -->
# A slightly different method to obtain theta...
plm::plm(mpg ~ carb, mtcars, index = "cyl", model = "random")
# }
Run the code above in your browser using DataLab