icc: Intraclass-Correlation Coefficient

Description

This function calculates the intraclass-correlation (icc) - sometimes also called variance partition coefficient (vpc) - for random intercepts of mixed effects models. Currently, merMod, glmmTMB, stanreg and brmsfit objects are supported.

Usage

icc(x, ...)
# S3 method for merMod
icc(x, adjusted = FALSE, ...)
# S3 method for glmmTMB
icc(x, adjusted = FALSE, ...)
# S3 method for stanreg
icc(x, re.form = NULL, typical = "mean",
  prob = 0.89, ppd = FALSE, adjusted = FALSE, ...)
# S3 method for brmsfit
icc(x, re.form = NULL, typical = "mean",
  prob = 0.89, ppd = FALSE, ...)

Arguments

Fitted mixed effects model (of class merMod, glmmTMB, stanreg or brmsfit).

...

Currently not used.

adjusted

Logical, if TRUE, the adjusted (and conditional) ICC is calculated, which reflects the uncertainty of all random effects (see 'Details'). For Bayesian models, if ppd = TRUE, adjusted will be ignored.

re.form

Formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects. Else, for instance for nested models, name a specific group-level effect to calculate the ICC for this group-level. Only applies if ppd = TRUE.

typical

Character vector, naming the function that will be used as measure of central tendency for the ICC. The default is "mean". See typical_value for options.

prob

Vector of scalars between 0 and 1, indicating the mass within the credible interval that is to be estimated. See hdi.

ppd

Logical, if TRUE, variance decomposition is based on the posterior predictive distribution, which is the correct way for Bayesian non-Gaussian models. By default, ppd is set to TRUE for non-Gaussian models.If adjusted = TRUE and ppd = FALSE, variance decomposition is approximated following the suggestion by Nakagawa et al. 2017 (see 'Details'), however, this is currently only implemented for Gaussian models.

Value

A numeric vector with all random intercept intraclass-correlation-coefficients. Furthermore, if adjusted = FALSE, between- and within-group variances as well as random-slope variance are returned as attributes. For stanreg or brmsfit objects, the HDI for each statistic is also included as attribute.

Details

The "simple" ICC (with both ppd and adjusted set to FALSE) is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance). The calculation of the ICC for generalized linear mixed models with binary outcome is based on Wu et al. (2012). For other distributions (negative binomial, poisson, ...), calculation is based on Nakagawa et al. 2017, however, for non-Gaussian models it is recommended to compute the adjusted ICC (with adjusted = TRUE, see below).

ICC for unconditional and conditional models

Usually, the ICC is calculated for the null model ("unconditional model"). However, according to Raudenbush and Bryk (2002) or Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC for full models with covariates ("conditional models") and compare how much a level-2 variable explains the portion of variation in the grouping structure (random intercept).

ICC for random-slope models

Caution: For models with random slopes and random intercepts, the ICC would differ at each unit of the predictors. Hence, the ICC for these kind of models cannot be understood simply as proportion of variance (see Goldstein et al. 2010). For convenience reasons, as the icc() function also extracts the different random effects variances, the ICC for random-slope-intercept-models is reported nonetheless, but it is usually no meaningful summary of the proportion of variances.

To get a meaningful ICC also for models with random slopes, use adjusted = TRUE. The adjusted ICC uses the mean random effect variance, which is based on the random effect variances for each value of the random slope (see Johnson et al. 2014).

ICC for models with multiple or nested random effects

Caution: By default, for three-level-models, depending on the nested structure of the model, or for models with multiple random effects, icc() only reports the proportion of variance explained for each grouping level. Use adjusted = TRUE to calculate the adjusted and conditional ICC, which condition on all random effects.

Adjusted and conditional ICC

If adjusted = TRUE, an adjusted and conditional ICC are calculated, which take all sources of uncertainty (of all random effects) into account to report an "adjusted" ICC, as well as the conditional ICC. The latter also takes the fixed effects variances into account (see Nakagawa et al. 2017). If random effects are not nested and not cross-classified, the adjusted (adjusted = TRUE) and unadjusted (adjusted = FALSE) ICC are identical. adjust = TRUE returns a meaningful ICC for models with random slopes. Furthermore, the adjusted ICC is recommended for models with other distributions than Gaussian.

ICC for specific group-levels

To calculate the proportion of variance for specific levels related to each other (e.g., similarity of level-1-units within level-2-units or level-2-units within level-3-units) must be computed manually. Use get_re_var to get the between-group-variances and residual variance of the model, and calculate the ICC for the various level correlations.

For example, for the ICC between level 1 and 2: sum(get_re_var(fit)) / (sum(get_re_var(fit)) + get_re_var(fit, "sigma_2"))

or for the ICC between level 2 and 3: get_re_var(fit)[2] / sum(get_re_var(fit))

ICC for Bayesian models

If ppd = TRUE, icc() calculates a variance decomposition based on the posterior predictive distribution. In this case, first, the draws from the posterior predictive distribution not conditioned on group-level terms (posterior_predict(..., re.form = NA)) are calculated as well as draws from this distribution conditioned on all random effects (by default, unless specified else in re.form) are taken. Then, second, the variances for each of these draws are calculated. The "ICC" is then the ratio between these two variances. This is the recommended way to analyse random-effect-variances for non-Gaussian models. It is then possible to compare variances accross models, also by specifying different group-level terms via the re.form-argument.

Sometimes, when the variance of the posterior predictive distribution is very large, the variance ratio in the output makes no sense, e.g. because it is negative. In such cases, it might help to use a more robust measure to calculate the central tendency of the variances. For example, use typical = "median".

References

Aguinis H, Gottfredson RK, Culpepper SA. 2013. Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management 39(6): 1490-1528 (10.1177/0149206313478188)
Goldstein H, Browne W, Rasbash J. 2010. Partitioning Variation in Multilevel Models. Understanding Statistics, 1:4, 223-231 (10.1207/S15328031US0104_02)
Grace-Martion K. The Intraclass Correlation Coefficient in Mixed Models, web
Hox J. 2002. Multilevel analysis: techniques and applications. Mahwah, NJ: Erlbaum
Johnson PC, O'Hara RB. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (10.1111/2041-210X.12225)
Nakagawa S, Johnson P, Schielzeth H (2017) The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. 10.1098/rsif.2017.0213
Rabe-Hesketh S, Skrondal A. 2012. Multilevel and longitudinal modeling using Stata. 3rd ed. College Station, Tex: Stata Press Publication
Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. 2nd ed. Thousand Oaks: Sage Publications
Wu S, Crespi CM, Wong WK. 2012. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contempory Clinical Trials 33: 869-880 (10.1016/j.cct.2012.05.004)

Further helpful online-ressources:

Examples

Run this code

# NOT RUN {
library(lme4)
fit0 <- lmer(Reaction ~ 1 + (1 | Subject), sleepstudy)
icc(fit0)

# note: ICC for random-slope-intercept model usually not
# meaningful, unless you use "adjusted = TRUE" - see 'Note'.
fit1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
icc(fit1)
icc(fit1, adjusted = TRUE)

sleepstudy$mygrp <- sample(1:45, size = 180, replace = TRUE)
fit2 <- lmer(Reaction ~ Days + (1 | mygrp) + (1 | Subject), sleepstudy)

icc1 <- icc(fit1)
icc2 <- icc(fit2)

print(icc1, comp = "var")
print(icc2, comp = "var")

# }
# NOT RUN {
# compute ICC for Bayesian mixed model, with an ICC for each
# sample of the posterior. The print()-method then shows
# the median ICC as well as 89% HDI for the ICC.
# Change interval with print-method:
# print(icc(m, posterior = TRUE), prob = .5)

if (requireNamespace("brms", quietly = TRUE)) {
  library(dplyr)
  sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE)
  sleepstudy <- sleepstudy %>%
    group_by(mygrp) %>%
    mutate(mysubgrp = sample(1:30, size = n(), replace = TRUE))
  m <- brms::brm(
    Reaction ~ Days + (1 | mygrp / mysubgrp) + (1 | Subject),
    data = sleepstudy
  )

  # by default, 89% interval
  icc(m)

  # show 50% interval
  icc(m, prob = .5)

  # variances based on posterior predictive distribution
  icc(m, ppd = TRUE)
}
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab