demean() computes group- and de-meaned versions of a
   variable that can be used in regression analysis to model the between-
   and within-subject effect.
demean(
  x,
  select,
  group,
  suffix_demean = "_within",
  suffix_groupmean = "_between"
)A data frame.
Character vector with names of variables to select that should be group- and de-meaned.
Name of the variable that indicates the group- or cluster-ID.
String value, will be appended to the names of the
group-meaned and de-meaned variables of x. By default, de-meaned
variables will be suffixed with "_within" and grouped-meaned variables
with "_between".
A data frame with the group-/de-meaned variables, which get the suffix
  "_between" (for the group-meaned variable) and "_within" (for
  the de-meaned variable) by default.
demean() is intended to create group- and de-meaned variables
    for panel regression models (fixed effects models), or for complex
    random-effect-within-between models (see Bell et al. 2018),
    where group-effects (random effects) and fixed effects correlate (see
    Bafumi and Gelman 2006)). This violation of one of the
    Gauss-Markov-assumptions can happen, for instance, when analysing panel
    data. To control for correlating predictors and group effects, it is
    recommended to include the group-meaned and de-meaned version of
    time-varying covariates in the model. By this, one can fit
    complex multilevel models for panel data, including time-varying predictors,
    time-invariant predictors and random effects. This approach is superior to
    classic fixed-effects models, which lack information of variation in the
    group-effects or between-subject effects.
  
The group-meaned variable is simply the mean of an independent variable
    within each group (or id-level or cluster) represented by group.
    It represents the cluster-mean of an independent variable. The de-meaned
    variable is then the centered version of the group-meaned variable. De-meaning
    is sometimes also called person-mean centering or centering within clusters.
  
For continuous time-varying predictors, the recommendation is to include both their de-meaned and group-meaned versions as fixed effects, but not the raw (untransformed) time-varying predictors themselves. The de-meaned predictor should also be included as random effect (random slope). In regression models, the coefficient of the de-meaned predictors indicates the within-subject effect, while the coefficient of the group-meaned predictor indicates the between-subject effect.
For binary time-varying predictors, the recommendation is to include
    the raw (untransformed) binary predictor as fixed effect only and the
    de-meaned variable as random effect (random slope)
    (Hoffmann 2015, chapter 8-2.I). demean() will thus coerce
    categorical time-varying predictors to numeric to compute the de- and
    group-meaned versions for these variables.
  
There are multiple ways to deal with interaction terms of within- and
    between-effects. A classical approach is to simply use the product
    term of the de-meaned variables (i.e. introducing the de-meaned variables
    as interaction term in the model formula, e.g. y ~ x_within * time_within).
    This approach, however, might be subject to bias (see Giesselmann & Schmidt-Catran 2018).
Another option is to first calculate the product term and then apply the
    de-meaning to it. This approach produces an estimator “that reflects
    unit-level differences of interacted variables whose moderators vary
    within units”, which is desirable if no within interaction of
    two time-dependent variables is required.  
    A third option, when the interaction should result in a genuine within
    estimator, is to "double de-mean" the interaction terms
    (Giesselmann & Schmidt-Catran 2018), however, this is currently
    not supported by demean(). If this is required, the wmb()
    function from the panelr package should be used.  
    To de-mean interaction terms for within-between models, simply specify
    the term as interaction for the select-argument, e.g.
    select = "a*b" (see 'Examples').
  
A description of how to translate the
    formulas described in Bell et al. 2018 into R using lmer()
    from lme4 or glmmTMB() from glmmTMB can be found here:
    for lmer()
    and for glmmTMB().
Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Giesselmann M, Schmidt-Catran A. (2018). Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
Hoffman L. 2015. Longitudinal analysis: modeling within-person fluctuation and change. New York: Routledge
# NOT RUN {
data(iris)
iris$ID <- sample(1:4, nrow(iris), replace = TRUE) # fake-ID
iris$binary <- as.factor(rbinom(150, 1, .35)) # binary variable
x <- demean(iris, select = c("Sepal.Length", "Petal.Length"), group = ID)
head(x)
x <- demean(iris, select = c("Sepal.Length", "binary", "Species"), group = ID)
head(x)
# demean interaction term x*y
dat <- data.frame(
  a = c(1, 2, 3, 4, 1, 2, 3, 4),
  x = c(4, 3, 3, 4, 1, 2, 1, 2),
  y = c(1, 2, 1, 2, 4, 3, 2, 1),
  ID = c(1, 2, 3, 1, 2, 3, 1, 2)
)
demean(dat, select = c("a", "x*y"), group = "ID")
# }
Run the code above in your browser using DataLab