
Last chance! 50% off unlimited learning
Sale ends in
fbetween
and fwithin
are S3 generics to efficiently obtain between-transformed (averaged) or within-transformed (demeaned) data. These operations can be performed groupwise and/or weighted. B
and W
are wrappers around fbetween
and fwithin
representing the 'between-operator' and the 'within-operator'. B
/ W
provide more flexibility than fbetween
/ fwithin
when applied to data frames (i.e. column subsetting, formula input, auto-renaming and id-variable-preservation capabilities...), but are otherwise identical.
(fbetween
and fwithin
are simple programmers functions in style of the Fast Statistical Functions while B
and W
are more practical to use in regression formulas or for ad-hoc computations on data frames.)
fbetween(x, …)
fwithin(x, …)
B(x, …)
W(x, …)
# S3 method for default
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for default
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
W(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for matrix
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for matrix
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, stub = "B.", …)
# S3 method for matrix
W(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, stub = "W.", …)
# S3 method for data.frame
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for data.frame
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for data.frame
B(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
fill = FALSE, stub = "B.", keep.by = TRUE, keep.w = TRUE, …)
# S3 method for data.frame
W(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
add.global.mean = FALSE, stub = "W.", keep.by = TRUE, keep.w = TRUE, …)# Methods for compatibility with plm:
# S3 method for pseries
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for pseries
B(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
W(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for pdata.frame
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pdata.frame
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for pdata.frame
B(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
fill = FALSE, stub = "B.", keep.ids = TRUE, keep.w = TRUE, …)
# S3 method for pdata.frame
W(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
add.global.mean = FALSE, stub = "W.", keep.ids = TRUE, keep.w = TRUE, …)
# Methods for compatibility with dplyr:
# S3 method for grouped_df
fbetween(x, w = NULL, na.rm = TRUE, fill = FALSE,
keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
fwithin(x, w = NULL, na.rm = TRUE, add.global.mean = FALSE,
keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
B(x, w = NULL, na.rm = TRUE, fill = FALSE,
stub = "B.", keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
W(x, w = NULL, na.rm = TRUE, add.global.mean = FALSE,
stub = "W.", keep.group_vars = TRUE, keep.w = TRUE, …)
a numeric vector, matrix, data.frame, panel-series (plm::pseries
), panel-data.frame (plm::pdata.frame
) or grouped tibble (dplyr::grouped_df
).
B and W data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1
or var1 + var2 ~ group1 + group2
. See Examples.
a numeric vector of (non-negative) weights. B/W
data.frame
and pdata.frame
methods also allow a one-sided formula i.e. ~ weightcol
. The grouped_df
(dplyr
) method supports lazy-evaluation. See Examples.
data.frame method: Select columns to center/average using a function, column names or indices. Default: All numeric variables. Note: cols
is ignored if a two-sided formula is passed to by
.
logical. skip missing values in x
when computing averages. If na.rm = FALSE
and a NA
or NaN
is encountered, the average for that group will be NA
, and all data points belonging to that group will also be NA
.
plm
methods: Select which panel identifier should be used as grouping variable. 1L means first variable in the plm::index
, 2L the second etc. if more than one integer is supplied, the corresponding index-variables are interacted.
a prefix or stub to rename all transformed columns. FALSE
will not rename columns.
option to fbetween/B
: Logical. TRUE
will overwrite missing values in x
with the respective average. By default missing values in x
are preserved.
option to fwithin/W
: Logical. TRUE
will add back the global mean to all data values after subtracting out group-means.
B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain grouping / panel-identifier columns in the output. For data frames this only works if grouping variables were passed in a formula.
B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain column containing the weights in the output. Only works if w
is passed as formula / lazy-expression.
arguments to be passed to or from other methods.
fbetween/B
returns x
with every element replaced by its (groupwise) mean (xi.
). fwithin/W
returns x
where every element was subtracted its (groupwise) mean (x - xi.
or x - xi. + x..
). See Details.
Without groups, fbetween/B
replaces all data points in x
with their mean or weighted mean (if w
is supplied). Similarly fwithin/W
subtracts the mean from all data points i.e. centers the data on the mean.
With groups supplied to g
, the replacement / centering performed by fbetween/B
| fwithin/W
becomes groupwise. I like to think of this in terms of panel data: If x
is a vector in such a dataset, xit
denotes a single data-point belonging to group i
in time-period t
(t
need not be a time-period). Then xi.
denotes x
, averaged over t
. fbetween/B
now returns xi.
and fwithin/W
returns x - xi.
. Thus for any data x
and any grouping vector g
: B(x,g) + W(x,g) = xi. + x - xi. = x
. In terms of variance, fbetween/B
only retains the variance between group averages, while fwithin/W
, by subtracting out group means, only retains the variance within those groups.
The data replacement performed by fbetween/B
can keep (default) or overwrite missing values (option fill
) in x
. fwithin/W
can center data simply (default), or add back the global / overall mean in groupwise computations (option add.global.mean
). Let x..
denote the global mean of x
, then fwithin/W
with add.global.mean = TRUE
returns x - xi. + x..
instead of x - xi.
. This is useful to get rid of group-differences but preserve the overall level of the data (as simple groupwise centering will set the overall mean of the data to 0). In regression analysis, centering with add.global.mean = TRUE
will only change the constant term. See Examples.
fHDbetween/HDB and fHDwithin/HDW
, fscale/STD
, TRA
, Data Transformations, Collapse Overview
# NOT RUN {
## Simple centering and averaging
fbetween(mtcars)
B(mtcars)
fwithin(mtcars)
W(mtcars)
fbetween(mtcars) + fwithin(mtcars) == mtcars # This should be true apart from rounding errors
## Groupwise centering and averaging
fbetween(mtcars, mtcars$cyl)
fwithin(mtcars, mtcars$cyl)
fbetween(mtcars, mtcars$cyl) + fwithin(mtcars, mtcars$cyl) == mtcars
W(wlddev, ~ iso3c, cols = 9:12) # Center the 4 series in this dataset by country
cbind(get_vars(wlddev,"iso3c"), # Same thing done manually using fwithin...
add_stub(fwithin(get_vars(wlddev,9:12), wlddev$iso3c), "W."))
## Using B() and W() in regressions:
# Several ways of running the same regression with cyl-fixed effects
lm(W(mpg,cyl) ~ W(carb,cyl), data = mtcars) # Centering each individually
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE)) # Centering the entire data
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE, # Here only the intercept changes
add.global.mean = TRUE))
lm(mpg ~ carb + B(carb,cyl), data = mtcars) # Procedure suggested by
# ...Mundlack (1978) - partialling out group averages amounts to the same as demeaning the data
# Now with cyl, vs and am fixed effects
lm(W(mpg,list(cyl,vs,am)) ~ W(carb,list(cyl,vs,am)), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, stub = FALSE))
lm(mpg ~ carb + B(carb,list(cyl,vs,am)), data = mtcars)
# Now with cyl, vs and am fixed effects weighted by hp:
lm(W(mpg,list(cyl,vs,am),hp) ~ W(carb,list(cyl,vs,am),hp), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, ~ hp, stub = FALSE))
lm(mpg ~ carb + B(carb,list(cyl,vs,am),hp), data = mtcars) # Gives a different coefficient!!
# }
Run the code above in your browser using DataLab