fbetween-fwithin-B-W: Fast Between (Averaging) and Within (Centering) Transformations

Description

fbetween and fwithin are S3 generics to efficiently obtain between-transformed (averaged) or within-transformed (demeaned) data. These operations can be performed groupwise and/or weighted. B and W are wrappers around fbetween and fwithin representing the 'between-operator' and the 'within-operator'. B / W provide more flexibility than fbetween / fwithin when applied to data frames (i.e. column subsetting, formula input, auto-renaming and id-variable-preservation capabilities...), but are otherwise identical.

(fbetween and fwithin are simple programmers functions in style of the Fast Statistical Functions while B and W are more practical to use in regression formulas or for ad-hoc computations on data frames.)

Usage

fbetween(x, …)
 fwithin(x, …)
       B(x, …)
       W(x, …)

# S3 method for default
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for default
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for default
W(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)

# S3 method for matrix
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for matrix
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for matrix
B(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, stub = "B.", …)
# S3 method for matrix
W(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, stub = "W.", …)

# S3 method for data.frame
fbetween(x, g = NULL, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for data.frame
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for data.frame
B(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
  fill = FALSE, stub = "B.", keep.by = TRUE, keep.w = TRUE, …)
# S3 method for data.frame
W(x, by = NULL, w = NULL, cols = is.numeric, na.rm = TRUE,
  add.global.mean = FALSE, stub = "W.", keep.by = TRUE, keep.w = TRUE, …)
# Methods for compatibility with plm:

# S3 method for pseries
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for pseries
B(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pseries
W(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)

# S3 method for pdata.frame
fbetween(x, effect = 1L, w = NULL, na.rm = TRUE, fill = FALSE, …)
# S3 method for pdata.frame
fwithin(x, effect = 1L, w = NULL, na.rm = TRUE, add.global.mean = FALSE, …)
# S3 method for pdata.frame
B(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
  fill = FALSE, stub = "B.", keep.ids = TRUE, keep.w = TRUE, …)
# S3 method for pdata.frame
W(x, effect = 1L, w = NULL, cols = is.numeric, na.rm = TRUE,
  add.global.mean = FALSE, stub = "W.", keep.ids = TRUE, keep.w = TRUE, …)
# Methods for compatibility with dplyr:

# S3 method for grouped_df
fbetween(x, w = NULL, na.rm = TRUE, fill = FALSE,
         keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
fwithin(x, w = NULL, na.rm = TRUE, add.global.mean = FALSE,
        keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
B(x, w = NULL, na.rm = TRUE, fill = FALSE,
  stub = "B.", keep.group_vars = TRUE, keep.w = TRUE, …)
# S3 method for grouped_df
W(x, w = NULL, na.rm = TRUE, add.global.mean = FALSE,
  stub = "W.", keep.group_vars = TRUE, keep.w = TRUE, …)

Arguments

a numeric vector, matrix, data.frame, panel-series (plm::pseries), panel-data.frame (plm::pdata.frame) or grouped tibble (dplyr::grouped_df).

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

B and W data.frame method: Same as g, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.

a numeric vector of (non-negative) weights. B/W data.frame and pdata.frame methods also allow a one-sided formula i.e. ~ weightcol. The grouped_df (dplyr) method supports lazy-evaluation. See Examples.

cols

data.frame method: Select columns to center/average using a function, column names or indices. Default: All numeric variables. Note: cols is ignored if a two-sided formula is passed to by.

na.rm

logical. skip missing values in x when computing averages. If na.rm = FALSE and a NA or NaN is encountered, the average for that group will be NA, and all data points belonging to that group will also be NA.

effect

plm methods: Select which panel identifier should be used as grouping variable. 1L means first variable in the plm::index, 2L the second etc. if more than one integer is supplied, the corresponding index-variables are interacted.

stub

a prefix or stub to rename all transformed columns. FALSE will not rename columns.

fill

option to fbetween/B: Logical. TRUE will overwrite missing values in x with the respective average. By default missing values in x are preserved.

add.global.mean

option to fwithin/W: Logical. TRUE will add back the global mean to all data values after subtracting out group-means.

keep.by, keep.ids, keep.group_vars

B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain grouping / panel-identifier columns in the output. For data frames this only works if grouping variables were passed in a formula.

keep.w

B and W data.frame, pdata.frame and grouped_df methods: Logical. Retain column containing the weights in the output. Only works if w is passed as formula / lazy-expression.

…

arguments to be passed to or from other methods.

Value

fbetween/B returns x with every element replaced by its (groupwise) mean (xi.). fwithin/W returns x where every element was subtracted its (groupwise) mean (x - xi. or x - xi. + x..). See Details.

Details

Without groups, fbetween/B replaces all data points in x with their mean or weighted mean (if w is supplied). Similarly fwithin/W subtracts the mean from all data points i.e. centers the data on the mean.

With groups supplied to g, the replacement / centering performed by fbetween/B | fwithin/W becomes groupwise. I like to think of this in terms of panel data: If x is a vector in such a dataset, xit denotes a single data-point belonging to group i in time-period t (t need not be a time-period). Then xi. denotes x, averaged over t. fbetween/B now returns xi. and fwithin/W returns x - xi.. Thus for any data x and any grouping vector g: B(x,g) + W(x,g) = xi. + x - xi. = x. In terms of variance, fbetween/B only retains the variance between group averages, while fwithin/W, by subtracting out group means, only retains the variance within those groups.

The data replacement performed by fbetween/B can keep (default) or overwrite missing values (option fill) in x. fwithin/W can center data simply (default), or add back the global / overall mean in groupwise computations (option add.global.mean). Let x.. denote the global mean of x, then fwithin/W with add.global.mean = TRUE returns x - xi. + x.. instead of x - xi.. This is useful to get rid of group-differences but preserve the overall level of the data (as simple groupwise centering will set the overall mean of the data to 0). In regression analysis, centering with add.global.mean = TRUE will only change the constant term. See Examples.

Examples

Run this code

# NOT RUN {
## Simple centering and averaging
fbetween(mtcars)
B(mtcars)
fwithin(mtcars)
W(mtcars)
fbetween(mtcars) + fwithin(mtcars) == mtcars # This should be true apart from rounding errors

## Groupwise centering and averaging
fbetween(mtcars, mtcars$cyl)
 fwithin(mtcars, mtcars$cyl)
fbetween(mtcars, mtcars$cyl) + fwithin(mtcars, mtcars$cyl) == mtcars

W(wlddev, ~ iso3c, cols = 9:12)    # Center the 4 series in this dataset by country
cbind(get_vars(wlddev,"iso3c"),    # Same thing done manually using fwithin...
      add_stub(fwithin(get_vars(wlddev,9:12), wlddev$iso3c), "W."))

## Using B() and W() in regressions:

# Several ways of running the same regression with cyl-fixed effects
lm(W(mpg,cyl) ~ W(carb,cyl), data = mtcars)                     # Centering each individually
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE))           # Centering the entire data
lm(mpg ~ carb, data = W(mtcars, ~ cyl, stub = FALSE,            # Here only the intercept changes
                        add.global.mean = TRUE))
lm(mpg ~ carb + B(carb,cyl), data = mtcars)                     # Procedure suggested by
# ...Mundlack (1978) - partialling out group averages amounts to the same as demeaning the data

# Now with cyl, vs and am fixed effects
lm(W(mpg,list(cyl,vs,am)) ~ W(carb,list(cyl,vs,am)), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, stub = FALSE))
lm(mpg ~ carb + B(carb,list(cyl,vs,am)), data = mtcars)

# Now with cyl, vs and am fixed effects weighted by hp:
lm(W(mpg,list(cyl,vs,am),hp) ~ W(carb,list(cyl,vs,am),hp), data = mtcars)
lm(mpg ~ carb, data = W(mtcars, ~ cyl + vs + am, ~ hp, stub = FALSE))
lm(mpg ~ carb + B(carb,list(cyl,vs,am),hp), data = mtcars)       # Gives a different coefficient!!

# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning