Learn R Programming

matrixCorr (version 0.8.3)

ccc_lmm_reml: Concordance Correlation via REML (Linear Mixed-Effects Model)

Description

Compute Lin's Concordance Correlation Coefficient (CCC) from a linear mixed-effects model fitted by REML. The fixed-effects part can include method and/or time (optionally their interaction), with a subject-specific random intercept to capture between-subject variation. Large \(n \times n\) inversions are avoided by solving small per-subject systems.

Assumption: time levels are treated as regular, equally spaced visits indexed by their order within subject. The AR(1) residual model is in discrete time on the visit index (not calendar time). NA time codes break the serial run. Gaps in the factor levels are ignored (adjacent observed visits are treated as lag-1).

Usage

ccc_lmm_reml(
  data,
  response,
  rind,
  method = NULL,
  time = NULL,
  interaction = FALSE,
  max_iter = 100,
  tol = 1e-06,
  Dmat = NULL,
  Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"),
  Dmat_weights = NULL,
  Dmat_rescale = TRUE,
  ci = FALSE,
  conf_level = 0.95,
  ci_mode = c("auto", "raw", "logit"),
  verbose = FALSE,
  digits = 4,
  use_message = TRUE,
  ar = c("none", "ar1"),
  ar_rho = NA_real_,
  slope = c("none", "subject", "method", "custom"),
  slope_var = NULL,
  slope_Z = NULL,
  drop_zero_cols = TRUE,
  vc_select = c("auto", "none"),
  vc_alpha = 0.05,
  vc_test_order = c("subj_time", "subj_method"),
  include_subj_method = NULL,
  include_subj_time = NULL,
  sb_zero_tol = 1e-10
)

Arguments

data

A data frame.

response

Character. Response variable name.

rind

Character. Subject ID variable name (random intercept).

method

Character or NULL. Optional column name of method factor (added to fixed effects).

time

Character or NULL. Optional column name of time factor (added to fixed effects).

interaction

Logical. Include method:time interaction? (default FALSE).

max_iter

Integer. Maximum iterations for variance-component updates (default 100).

tol

Numeric. Convergence tolerance on parameter change (default 1e-6).

Dmat

Optional \(n_t \times n_t\) numeric matrix to weight/aggregate time-specific fixed biases in the \(S_B\) quadratic form. If supplied, it is used (after optional mass rescaling; see Dmat_rescale) whenever at least two present time levels exist; otherwise it is ignored. If Dmat is NULL, a canonical kernel \(D_m\) is constructed from Dmat_type and Dmat_weights (see below). Dmat should be symmetric positive semidefinite; small asymmetries are symmetrized internally.

Dmat_type

Character, one of c("time-avg","typical-visit", "weighted-avg","weighted-sq"). Only used when Dmat = NULL. It selects the aggregation target for time-specific fixed biases in \(S_B\). Options are:

  • "time-avg": square of the time-averaged bias, \(D_m=(1/n_t)\,11^\top\).

  • "typical-visit": average of squared per-time biases, \(D_m=I_{n_t}\).

  • "weighted-avg": square of a weighted average, \(D_m=n_t\,w\,w^\top\) with \(\sum w=1\).

  • "weighted-sq": weighted average of squared biases, \(D_m=n_t\,\mathrm{diag}(w)\) with \(\sum w=1\).

Pick "time-avg" for CCC targeting the time-averaged measurement; pick "typical-visit" for CCC targeting a randomly sampled visit (typical occasion). Default "time-avg".

Dmat_weights

Optional numeric weights \(w\) used when Dmat_type %in% c("weighted-avg","weighted-sq"). Must be nonnegative and finite. If names(w) are provided, they should match the full time levels in data; they are aligned to the present time subset per fit. If unnamed, the length must equal the number of present time levels. In all cases \(w\) is internally normalized to sum to 1.

Dmat_rescale

Logical. When TRUE (default), the supplied/built \(D_m\) is rescaled to satisfy the simple mass rule \(1^\top D_m 1 = n_t\). This keeps the \(S_B\) denominator invariant and harmonizes with the \(\kappa\)-shrinkage used for variance terms.

ci

Logical. If TRUE, return a CI container; limits are computed by a large-sample delta method for CCC (see CIs note below).

conf_level

Numeric in \((0,1)\). Confidence level when ci = TRUE (default 0.95).

ci_mode

Character scalar; one of c("auto","raw","logit"). Controls how confidence intervals are computed when ci = TRUE. If "raw", a Wald CI is formed on the CCC scale and truncated to [0,1]. If "logit", a Wald CI is computed on the \(\mathrm{logit}(\mathrm{CCC})\) scale and back-transformed to the original scale (often more stable near 0 or 1). If "auto" (default), the method is chosen per estimate based on simple diagnostics (e.g., proximity to the [0,1] boundary / numerical stability), typically preferring "logit" near the boundaries and "raw" otherwise.

verbose

Logical. If TRUE, prints a structured summary of the fitted variance components and \(S_B\) for each fit. Default FALSE.

digits

Integer \((\ge 0)\). Number of decimal places to use in the printed summary when verbose = TRUE. Default 4.

use_message

Logical. When verbose = TRUE, choose the printing mechanism, where TRUE uses message() (respects sink(), easily suppressible via suppressMessages()), whereas FALSE uses cat() to stdout. Default TRUE.

ar

Character. Residual correlation structure: "none" (iid) or "ar1" for subject-level AR(1) correlation within contiguous time runs. Default c("none").

ar_rho

Numeric in \((-0.999,\,0.999)\) or NA. If ar = "ar1" and ar_rho is finite, it is treated as fixed. If ar = "ar1" and ar_rho = NA, \(\rho\) is estimated by profiling a 1-D objective (REML when available; an approximation otherwise). Default NA_real_.

slope

Character. Optional extra random-effect design \(Z\). With "subject" a single random slope is added (one column in \(Z\)); with "method" one column per method level is added; with "custom" you provide slope_Z directly. Default c("none","subject","method","custom").

slope_var

For slope %in% c("subject","method"), a character string giving the name of a column in data used as the slope regressor (e.g., centered time). It is looked up inside data; do not pass the vector itself. NAs are treated as zeros in \(Z\).

slope_Z

For slope = "custom", a numeric matrix with \(n\) rows (same order as data) providing the full extra random-effect design \(Z\). Each column of slope_Z has its own variance component \(\sigma_{Z,j}^2\); columns are treated as uncorrelated (diagonal block in \(G\)). Ignored otherwise.

drop_zero_cols

Logical. When slope = "method", drop all-zero columns of \(Z\) after subsetting (useful in pairwise fits). Default TRUE.

vc_select

Character scalar; one of c("auto","none"). Controls how the subject by method \(\sigma^2_{A\times M}\) ("subj_method") and subject by time \(\sigma^2_{A\times T}\) ("subj_time") variance components are included. If "auto" (default), the function performs boundary-aware REML likelihood-ratio tests (LRTs; null on the boundary at zero with a half-\(\chi^2_1\) reference) to decide whether to retain each component, in the order given by vc_test_order. If "none", no testing is done and inclusion is taken from include_subj_method/include_subj_time (or, if NULL, from the mere presence of the corresponding factor in the design). In pairwise fits, the decision is made independently for each method pair.

vc_alpha

Numeric scalar in \((0,1)\); default 0.05. Per-component significance level for the boundary-aware REML LRTs used when vc_select = "auto". The tests are one-sided for variance components on the boundary and are not multiplicity-adjusted.

vc_test_order

Character vector (length 2) with a permutation of c("subj_time","subj_method"); default c("subj_time","subj_method"). Specifies the order in which the two variance components are tested when vc_select = "auto". The component tested first may be dropped before testing the second. If a factor is absent in the design (e.g., no time factor so "subj_time" is undefined), the corresponding test is skipped.

include_subj_method, include_subj_time

Logical scalars or NULL. When vc_select = "none", these control whether the \(\sigma^2_{A\times M}\) ("subj_method") and \(\sigma^2_{A\times T}\) ("subj_time") random effects are included (TRUE) or excluded (FALSE) in the model. If NULL (default), inclusion defaults to the presence of the corresponding factor in the data (i.e., at least two method/time levels). When vc_select = "auto", these arguments are ignored (automatic selection is used instead).

sb_zero_tol

Non-negative numeric scalar; default 1e-10. Numerical threshold for the fixed-effect dispersion term \(S_B\). After computing \(\widehat{S_B}\) and its delta-method variance, if \(\widehat{S_B} \le\) sb_zero_tol or non-finite, the procedure treats \(S_B\) as fixed at zero in the delta step. It sets \(d_{S_B}=0\) and \(\mathrm{Var}(\widehat{S_B})=0\), preventing numerical blow-ups of SE(CCC) when \(\widehat{S_B}\to 0\) and the fixed-effects variance is ill-conditioned for the contrast. This stabilises inference in rare boundary cases; it has no effect when \(\widehat{S_B}\) is comfortably above the threshold.

Notes on stability and performance

All per-subject solves are \(\,r\times r\) with \(r=1+nm+nt+q_Z\), so cost scales with the number of subjects and the fixed-effects dimension rather than the total number of observations. Solvers use symmetric-PD paths with a small diagonal ridge and pseudo-inverse, which helps for very small/unbalanced subsets and near-boundary estimates. For AR(1), observations are ordered by time within subject; NA time codes break the run, and gaps between factor levels are treated as regular steps (elapsed time is not used).

Heteroscedastic slopes across \(Z\) columns are supported. Each \(Z\) column has its own variance component \(\sigma_{Z,j}^2\), but cross-covariances among \(Z\) columns are set to zero (diagonal block). Column rescaling changes the implied prior on \(b_{i,\text{extra}}\) but does not introduce correlations.

Threading and BLAS guards

The C++ backend uses OpenMP loops while also forcing vendor BLAS libraries to run single-threaded so that overall CPU usage stays predictable. On OpenBLAS and Apple's Accelerate this is handled automatically. On Intel MKL builds the guard is disabled by default, but you can also opt out manually by setting MATRIXCORR_DISABLE_BLAS_GUARD=1 in the environment before loading matrixCorr.

Model overview

Internally, the call is routed to ccc_lmm_reml_pairwise(), which fits one repeated-measures mixed model per pair of methods. Each model includes:

  • subject random intercepts (always)

  • optional subject-by-method (sigma^2_{A \times M}) and subject-by-time (sigma^2_{A \times T}) variance components

  • optional random slopes specified via slope/slope_var/slope_Z

  • residual structure ar = "none" (iid) or ar = "ar1"

D-matrix options (Dmat_type, Dmat, Dmat_weights) control how time averaging operates when translating variance components into CCC summaries.

Author

Thiago de Paula Oliveira

Details

For measurement \(y_{ij}\) on subject \(i\) under fixed levels (method, time), we fit $$ y = X\beta + Zu + \varepsilon,\qquad u \sim N(0,\,G),\ \varepsilon \sim N(0,\,R). $$ Notation: \(m\) subjects, \(n=\sum_i n_i\) total rows; \(nm\) method levels; \(nt\) time levels; \(q_Z\) extra random-slope columns (if any); \(r=1+nm+nt\) (or \(1+nm+nt+q_Z\) with slopes). Here \(Z\) is the subject-structured random-effects design and \(G\) is block-diagonal at the subject level with the following per-subject parameterisation. Specifically,

  • one random intercept with variance \(\sigma_A^2\);

  • optionally, method deviations (one column per method level) with a common variance \(\sigma_{A\times M}^2\) and zero covariances across levels (i.e., multiple of an identity);

  • optionally, time deviations (one column per time level) with a common variance \(\sigma_{A\times T}^2\) and zero covariances across levels;

  • optionally, an extra random effect aligned with \(Z\) (random slope), where each column has its own variance \(\sigma_{Z,j}^2\) and columns are uncorrelated.

The fixed-effects design is ~ 1 + method + time and, if interaction=TRUE, + method:time.

Residual correlation \(R\) (regular, equally spaced time). Write \(R_i=\sigma_E^2\,C_i(\rho)\). With ar="none", \(C_i=I\). With ar="ar1", within-subject residuals follow a discrete AR(1) process along the visit index after sorting by increasing time level. Ties retain input order, and any NA time code breaks the series so each contiguous block of non-NA times forms a run. The correlation between adjacent observed visits in a run is \(\rho\); we do not use calendar-time gaps. Internally we work with the precision of the AR(1) correlation: for a run of length \(L\ge 2\), the tridiagonal inverse has $$ (C^{-1})_{11}=(C^{-1})_{LL}=\frac{1}{1-\rho^2},\quad (C^{-1})_{tt}=\frac{1+\rho^2}{1-\rho^2}\ (2\le t\le L-1),\quad (C^{-1})_{t,t+1}=(C^{-1})_{t+1,t}=\frac{-\rho}{1-\rho^2}. $$ The working inverse is \(\,R_i^{-1}=\sigma_E^{-2}\,C_i(\rho)^{-1}\).

Per-subject Woodbury system. For subject \(i\) with \(n_i\) rows, define the per-subject random-effects design \(U_i\) (columns: intercept, method indicators, time indicators; dimension \(\,r=1+nm+nt\,\)). The core never forms \(V_i = R_i + U_i G U_i^\top\) explicitly. Instead, $$ M_i \;=\; G^{-1} \;+\; U_i^\top R_i^{-1} U_i, $$ and accumulates GLS blocks via rank-\(r\) corrections using \(\,V_i^{-1} = R_i^{-1}-R_i^{-1}U_i M_i^{-1}U_i^\top R_i^{-1}\,\): $$ X^\top V^{-1} X \;=\; \sum_i \Big[ X_i^\top R_i^{-1}X_i \;-\; (X_i^\top R_i^{-1}U_i)\, M_i^{-1}\,(U_i^\top R_i^{-1}X_i) \Big], $$ $$ X^\top V^{-1} y \;=\; \sum_i \Big[ X_i^\top R_i^{-1}y_i \;-\; (X_i^\top R_i^{-1}U_i)\,M_i^{-1}\, (U_i^\top R_i^{-1}y_i) \Big]. $$ Because \(G^{-1}\) is diagonal with positive entries, each \(M_i\) is symmetric positive definite; solves/inversions use symmetric-PD routines with a small diagonal ridge and a pseudo-inverse if needed.

Random-slope \(Z\). Besides \(U_i\), the function can include an extra design \(Z_i\).

  • slope="subject": \(Z\) has one column (the regressor in slope_var); \(Z_{i}\) is the subject-\(i\) block, with its own variance \(\sigma_{Z,1}^2\).

  • slope="method": \(Z\) has one column per method level; row \(t\) uses the slope regressor if its method equals level \(\ell\), otherwise 0; all-zero columns can be dropped via drop_zero_cols=TRUE after subsetting. Each column has its own variance \(\sigma_{Z,\ell}^2\).

  • slope="custom": \(Z\) is provided fully via slope_Z. Each column is an independent random effect with its own variance \(\sigma_{Z,j}^2\); cross-covariances among columns are set to 0.

Computations simply augment \(\tilde U_i=[U_i\ Z_i]\) and the corresponding inverse-variance block. The EM updates then include, for each column \(j=1,\ldots,q_Z\), $$ \sigma_{Z,j}^{2\,(new)} \;=\; \frac{1}{m} \sum_{i=1}^m \Big( b_{i,\text{extra},j}^2 + (M_i^{-1})_{\text{extra},jj} \Big) \quad (\text{if } q_Z>0). $$ Interpretation: the \(\sigma_{Z,j}^2\) represent additional within-subject variability explained by the slope regressor(s) in column \(j\) and are not part of the CCC denominator (agreement across methods/time).

EM-style variance-component updates. With current \(\hat\beta\), form residuals \(r_i = y_i - X_i\hat\beta\). The BLUPs and conditional covariances are $$ b_i \;=\; M_i^{-1}\,(U_i^\top R_i^{-1} r_i), \qquad \mathrm{Var}(b_i\mid y) \;=\; M_i^{-1}. $$ Let \(e_i=r_i-U_i b_i\). Expected squares then yield closed-form updates: $$ \sigma_A^{2\,(new)} \;=\; \frac{1}{m}\sum_i \Big( b_{i,0}^2 + (M_i^{-1})_{00} \Big), $$ $$ \sigma_{A\times M}^{2\,(new)} \;=\; \frac{1}{m\,nm} \sum_i \sum_{\ell=1}^{nm}\!\Big( b_{i,\ell}^2 + (M_i^{-1})_{\ell\ell} \Big) \quad (\text{if } nm>0), $$ $$ \sigma_{A\times T}^{2\,(new)} \;=\; \frac{1}{m\,nt} \sum_i \sum_{t=1}^{nt} \Big( b_{i,t}^2 + (M_i^{-1})_{tt} \Big) \quad (\text{if } nt>0), $$ $$ \sigma_E^{2\,(new)} \;=\; \frac{1}{n} \sum_i \Big( e_i^\top C_i(\rho)^{-1} e_i \;+\; \mathrm{tr}\!\big(M_i^{-1}U_i^\top C_i(\rho)^{-1} U_i\big) \Big), $$ together with the per-column update for \(\sigma_{Z,j}^2\) given above. Iterate until the \(\ell_1\) change across components is \(<\)tol or max_iter is reached.

Fixed-effect dispersion \(S_B\): choosing the time-kernel \(D_m\).

Let \(d = L^\top \hat\beta\) stack the within-time, pairwise method differences, grouped by time as \(d=(d_1^\top,\ldots,d_{n_t}^\top)^\top\) with \(d_t \in \mathbb{R}^{P}\) and \(P = n_m(n_m-1)\). The symmetric positive semidefinite kernel \(D_m \succeq 0\) selects which functional of the bias profile \(t \mapsto d_t\) is targeted by \(S_B\). Internally, the code rescales any supplied/built \(D_m\) to satisfy \(1^\top D_m 1 = n_t\) for stability and comparability.

  • Dmat_type = "time-avg" (square of the time-averaged bias). Let $$ w \;=\; \frac{1}{n_t}\,\mathbf{1}_{n_t}, \qquad D_m \;\propto\; I_P \otimes (w\,w^\top), $$ so that $$ d^\top D_m d \;\propto\; \sum_{p=1}^{P} \left( \frac{1}{n_t}\sum_{t=1}^{n_t} d_{t,p} \right)^{\!2}. $$ Methods have equal \(\textit{time-averaged}\) means within subject, i.e. \(\sum_{t=1}^{n_t} d_{t,p}/n_t = 0\) for all \(p\). Appropriate when decisions depend on an average over time and opposite-signed biases are allowed to cancel.

  • Dmat_type = "typical-visit" (average of squared per-time biases). With equal visit probability, take $$ D_m \;\propto\; I_P \otimes \mathrm{diag}\!\Big(\tfrac{1}{n_t},\ldots,\tfrac{1}{n_t}\Big), $$ yielding $$ d^\top D_m d \;\propto\; \frac{1}{n_t} \sum_{t=1}^{n_t}\sum_{p=1}^{P} d_{t,p}^{\,2}. $$ Methods agree on a \(\textit{typical}\) occasion drawn uniformly from the visit set. Use when each visit matters on its own; alternating signs \(d_{t,p}\) do not cancel.

  • Dmat_type = "weighted-avg" (square of a weighted time average). For user weights \(a=(a_1,\ldots,a_{n_t})^\top\) with \(a_t \ge 0\), set $$ w \;=\; \frac{a}{\sum_{t=1}^{n_t} a_t}, \qquad D_m \;\propto\; I_P \otimes (w\,w^\top), $$ so that $$ d^\top D_m d \;\propto\; \sum_{p=1}^{P} \left( \sum_{t=1}^{n_t} w_t\, d_{t,p} \right)^{\!2}. $$ Methods have equal \(\textit{weighted time-averaged}\) means, i.e. \(\sum_{t=1}^{n_t} w_t\, d_{t,p} = 0\) for all \(p\). Use when some visits (e.g., baseline/harvest) are a priori more influential; opposite-signed biases may cancel according to \(w\).

  • Dmat_type = "weighted-sq" (weighted average of squared per-time biases). With the same weights \(w\), take $$ D_m \;\propto\; I_P \otimes \mathrm{diag}(w_1,\ldots,w_{n_t}), $$ giving $$ d^\top D_m d \;\propto\; \sum_{t=1}^{n_t} w_t \sum_{p=1}^{P} d_{t,p}^{\,2}. $$ Methods agree at visits sampled with probabilities \(\{w_t\}\), counting each visit's discrepancy on its own. Use when per-visit agreement is required but some visits should be emphasised more than others.

Time-averaging for CCC (regular visits). The reported CCC targets agreement of the time-averaged measurements per method within subject by default (Dmat_type="time-avg"). Averaging over \(T\) non-NA visits shrinks time-varying components by $$ \kappa_T^{(g)} \;=\; 1/T, \qquad \kappa_T^{(e)} \;=\; \{T + 2\sum_{k=1}^{T-1}(T-k)\rho^k\}/T^2, $$ with \(\kappa_T^{(e)}=1/T\) when residuals are i.i.d. With unbalanced \(T\), the implementation averages the per-(subject,method) \(\kappa\) values across the pairs contributing to CCC and then clamps \(\kappa_T^{(e)}\) to \([10^{-12},\,1]\) for numerical stability. Choosing Dmat_type="typical-visit" makes \(S_B\) match the interpretation of a randomly sampled occasion instead.

Concordance correlation coefficient. The CCC used is $$ \mathrm{CCC} \;=\; \frac{\sigma_A^2 + \kappa_T^{(g)}\,\sigma_{A\times T}^2} {\sigma_A^2 + \sigma_{A\times M}^2 + \kappa_T^{(g)}\,\sigma_{A\times T}^2 + S_B + \kappa_T^{(e)}\,\sigma_E^2}. $$ Special cases: with no method factor, \(S_B=\sigma_{A\times M}^2=0\); with a single time level, \(\sigma_{A\times T}^2=0\) (no \(\kappa\)-shrinkage). When \(T=1\) or \(\rho=0\), both \(\kappa\)-factors equal 1. The extra random-effect variances \(\{\sigma_{Z,j}^2\}\) (if used) are not included.

CIs / SEs (delta method for CCC). Let $$ \theta \;=\; \big(\sigma_A^2,\ \sigma_{A\times M}^2,\ \sigma_{A\times T}^2,\ \sigma_E^2,\ S_B\big)^\top, $$ and write \(\mathrm{CCC}(\theta)=N/D\) with \(N=\sigma_A^2+\kappa_T^{(g)}\sigma_{A\times T}^2\) and \(D=\sigma_A^2+\sigma_{A\times M}^2+\kappa_T^{(g)}\sigma_{A\times T}^2+S_B+\kappa_T^{(e)}\sigma_E^2\). The gradient components are $$ \frac{\partial\,\mathrm{CCC}}{\partial \sigma_A^2} \;=\; \frac{\sigma_{A\times M}^2 + S_B + \kappa_T^{(e)}\sigma_E^2}{D^2}, $$ $$ \frac{\partial\,\mathrm{CCC}}{\partial \sigma_{A\times M}^2} \;=\; -\,\frac{N}{D^2}, \qquad \frac{\partial\,\mathrm{CCC}}{\partial \sigma_{A\times T}^2} \;=\; \frac{\kappa_T^{(g)}\big(\sigma_{A\times M}^2 + S_B + \kappa_T^{(e)}\sigma_E^2\big)}{D^2}, $$ $$ \frac{\partial\,\mathrm{CCC}}{\partial \sigma_E^2} \;=\; -\,\frac{\kappa_T^{(e)}\,N}{D^2}, \qquad \frac{\partial\,\mathrm{CCC}}{\partial S_B} \;=\; -\,\frac{N}{D^2}. $$

Estimating \(\mathrm{Var}(\hat\theta)\). The EM updates write each variance component as an average of per-subject quantities. For subject \(i\), $$ t_{A,i} \;=\; b_{i,0}^2 + (M_i^{-1})_{00},\qquad t_{M,i} \;=\; \frac{1}{nm}\sum_{\ell=1}^{nm} \Big(b_{i,\ell}^2 + (M_i^{-1})_{\ell\ell}\Big), $$ $$ t_{T,i} \;=\; \frac{1}{nt}\sum_{j=1}^{nt} \Big(b_{i,j}^2 + (M_i^{-1})_{jj}\Big),\qquad s_i \;=\; \frac{e_i^\top C_i(\rho)^{-1} e_i + \mathrm{tr}\!\big(M_i^{-1}U_i^\top C_i(\rho)^{-1} U_i\big)}{n_i}, $$ where \(b_i = M_i^{-1}(U_i^\top R_i^{-1} r_i)\) and \(M_i = G^{-1} + U_i^\top R_i^{-1} U_i\). With \(m\) subjects, form the empirical covariance of the stacked subject vectors and scale by \(m\) to approximate the covariance of the means: $$ \widehat{\mathrm{Cov}}\!\left( \begin{bmatrix} t_{A,\cdot} \\ t_{M,\cdot} \\ t_{T,\cdot} \end{bmatrix} \right) \;\approx\; \frac{1}{m}\, \mathrm{Cov}_i\!\left( \begin{bmatrix} t_{A,i} \\ t_{M,i} \\ t_{T,i} \end{bmatrix}\right). $$ (Drop rows/columns as needed when nm==0 or nt==0.)

The residual variance estimator is a weighted mean \(\hat\sigma_E^2=\sum_i w_i s_i\) with \(w_i=n_i/n\). Its variance is approximated by the variance of a weighted mean of independent terms, $$ \widehat{\mathrm{Var}}(\hat\sigma_E^2) \;\approx\; \Big(\sum_i w_i^2\Big)\,\widehat{\mathrm{Var}}(s_i), $$ where \(\widehat{\mathrm{Var}}(s_i)\) is the sample variance across subjects. The method-dispersion term uses the quadratic-form delta already computed for \(S_B\): $$ \widehat{\mathrm{Var}}(S_B) \;=\; \frac{2\,\mathrm{tr}\!\big((A_{\!fix}\,\mathrm{Var}(\hat\beta))^2\big) \;+\; 4\,\hat\beta^\top A_{\!fix}\,\mathrm{Var}(\hat\beta)\, A_{\!fix}\,\hat\beta} {\big[nm\,(nm-1)\,\max(nt,1)\big]^2}, $$ with \(A_{\!fix}=L\,D_m\,L^\top\).

Putting it together. Assemble \(\widehat{\mathrm{Var}}(\hat\theta)\) by combining the \((\sigma_A^2,\sigma_{A\times M}^2,\sigma_{A\times T}^2)\) covariance block from the subject-level empirical covariance, add the \(\widehat{\mathrm{Var}}(\hat\sigma_E^2)\) and \(\widehat{\mathrm{Var}}(S_B)\) terms on the diagonal, and ignore cross-covariances across these blocks (a standard large-sample simplification). Then $$ \widehat{\mathrm{se}}\{\widehat{\mathrm{CCC}}\} \;=\; \sqrt{\,\nabla \mathrm{CCC}(\hat\theta)^\top\, \widehat{\mathrm{Var}}(\hat\theta)\, \nabla \mathrm{CCC}(\hat\theta)\,}. $$

A two-sided \((1-\alpha)\) normal CI is $$ \widehat{\mathrm{CCC}} \;\pm\; z_{1-\alpha/2}\, \widehat{\mathrm{se}}\{\widehat{\mathrm{CCC}}\}, $$ truncated to \([0,1]\) in the output for convenience. When \(S_B\) is truncated at 0 or samples are very small/imbalanced, the normal CI can be mildly anti-conservative near the boundary; a logit transform for CCC or a subject-level (cluster) bootstrap can be used for sensitivity analysis.

Choosing \(\rho\) for AR(1). When ar="ar1" and ar_rho = NA, \(\rho\) is estimated by profiling the REML log-likelihood at \((\hat\beta,\hat G,\hat\sigma_E^2)\). With very few visits per subject, \(\rho\) can be weakly identified; consider sensitivity checks over a plausible range.

References

Lin L (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45: 255-268.

Lin L (2000). A note on the concordance correlation coefficient. Biometrics, 56: 324-325.

Carrasco, J. L. et al. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3), 293-304. tools:::Rd_expr_doi("10.1016/j.cmpb.2012.09.002")

King et al. (2007). A Class of Repeated Measures Concordance Correlation Coefficients. Journal of Biopharmaceutical Statistics, 17(4). tools:::Rd_expr_doi("10.1080/10543400701329455")

See Also

build_L_Dm_Z_cpp for constructing \(L\)/\(D_m\)/\(Z\); ccc_pairwise_u_stat for a U-statistic alternative; and cccrm for a reference approach via nlme.

Examples

Run this code
# ====================================================================
# 1) Subject x METHOD variance present, no time
#     y_{i,m} = mu + b_m + u_i + w_{i,m} + e_{i,m}
#     with u_i ~ N(0, s_A^2), w_{i,m} ~ N(0, s_{AxM}^2)
# ====================================================================
set.seed(102)
n_subj <- 60
n_time <- 8

id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"),    each  = n_time), times = n_subj))

sigA  <- 0.6   # subject
sigAM <- 0.3   # subject x method
sigAT <- 0.5   # subject x time
sigE  <- 0.4   # residual
# Expected estimate S_B = 0.2^2 = 0.04
biasB <- 0.2   # fixed method bias

# random effects
u_i <- rnorm(n_subj, 0, sqrt(sigA))
u   <- u_i[as.integer(id)]

sm      <- interaction(id, method, drop = TRUE)
w_im_lv <- rnorm(nlevels(sm), 0, sqrt(sigAM))
w_im    <- w_im_lv[as.integer(sm)]

st      <- interaction(id, time, drop = TRUE)
g_it_lv <- rnorm(nlevels(st), 0, sqrt(sigAT))
g_it    <- g_it_lv[as.integer(st)]

# residuals & response
e <- rnorm(length(id), 0, sqrt(sigE))
y <- (method == "B") * biasB + u + w_im + g_it + e

dat_both <- data.frame(y, id, method, time)

# Both sigma2_subject_method and sigma2_subject_time are identifiable here
fit_both <- ccc_lmm_reml(dat_both, "y", "id", method = "method", time = "time",
                         vc_select = "auto", verbose = TRUE)
summary(fit_both)
plot(fit_both)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(fit_both)
}

# ====================================================================
# 2) Subject x TIME variance present (sag > 0) with two methods
#     y_{i,m,t} = mu + b_m + u_i + g_{i,t} + e_{i,m,t}
#     where g_{i,t} ~ N(0, s_{AxT}^2) shared across methods at time t
# ====================================================================
set.seed(202)
n_subj <- 60; n_time <- 14
id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))

sigA  <- 0.7
sigAT <- 0.5
sigE  <- 0.5
biasB <- 0.25

u   <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)]
gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT))
g   <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ]
y   <- (method == "B") * biasB + u + g + rnorm(length(id), 0, sqrt(sigE))
dat_sag <- data.frame(y, id, method, time)

# sigma_AT should be retained; sigma_AM may be dropped (since w_{i,m}=0)
fit_sag <- ccc_lmm_reml(dat_sag, "y", "id", method = "method", time = "time",
                        vc_select = "auto", verbose = TRUE)
summary(fit_sag)
plot(fit_sag)

# ====================================================================
# 3) BOTH components present: sab > 0 and sag > 0 (2 methods x T times)
#     y_{i,m,t} = mu + b_m + u_i + w_{i,m} + g_{i,t} + e_{i,m,t}
# ====================================================================
set.seed(303)
n_subj <- 60; n_time <- 4
id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))

sigA  <- 0.8
sigAM <- 0.3
sigAT <- 0.4
sigE  <- 0.5
biasB <- 0.2

u   <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)]
# (subject, method) random deviations: repeat per (i,m) across its times
wIM <- rnorm(n_subj * 2, 0, sqrt(sigAM))
w   <- wIM[ (as.integer(id) - 1L) * 2 + as.integer(method) ]
# (subject, time) random deviations: shared across methods at time t
gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT))
g   <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ]
y   <- (method == "B") * biasB + u + w + g + rnorm(length(id), 0, sqrt(sigE))
dat_both <- data.frame(y, id, method, time)

fit_both <- ccc_lmm_reml(dat_both, "y", "id", method = "method", time = "time",
                         vc_select = "auto", verbose = TRUE, ci = TRUE)
summary(fit_both)
plot(fit_both)

# If you want to force-include both VCs (skip testing):
fit_both_forced <-
 ccc_lmm_reml(dat_both, "y", "id", method = "method", time = "time",
              vc_select = "none", include_subj_method  = TRUE,
              include_subj_time  = TRUE, verbose = TRUE)
summary(fit_both_forced)
plot(fit_both_forced)

# ====================================================================
# 4) D_m choices: time-averaged (default) vs typical visit
# ====================================================================
# Time-average
ccc_lmm_reml(dat_both, "y", "id", method = "method", time = "time",
             vc_select = "none", include_subj_method  = TRUE,
             include_subj_time  = TRUE, Dmat_type = "time-avg")

# Typical visit
ccc_lmm_reml(dat_both, "y", "id", method = "method", time = "time",
             vc_select = "none", include_subj_method  = TRUE,
             include_subj_time  = TRUE, Dmat_type = "typical-visit")

# ====================================================================
# 5) AR(1) residual correlation with fixed rho (larger example)
# ====================================================================
# \donttest{
set.seed(10)
n_subj   <- 40
n_time   <- 10
methods  <- c("A", "B", "C", "D")
nm       <- length(methods)
id     <- factor(rep(seq_len(n_subj), each = n_time * nm))
method <- factor(rep(rep(methods, each = n_time), times = n_subj),
                 levels = methods)
time   <- factor(rep(rep(seq_len(n_time), times = nm), times = n_subj))

beta0    <- 0
beta_t   <- 0.2
bias_met <- c(A = 0.00, B = 0.30, C = -0.15, D = 0.05)
sigA     <- 1.0
rho_true <- 0.6
sigE     <- 0.7

t_num <- as.integer(time)
t_c   <- t_num - mean(seq_len(n_time))
mu    <- beta0 + beta_t * t_c + bias_met[as.character(method)]

u_subj <- rnorm(n_subj, 0, sqrt(sigA))
u      <- u_subj[as.integer(id)]

e <- numeric(length(id))
for (s in seq_len(n_subj)) {
  for (m in methods) {
    idx <- which(id == levels(id)[s] & method == m)
    e[idx] <- stats::arima.sim(list(ar = rho_true), n = n_time, sd = sigE)
  }
}
y <- mu + u + e
dat_ar4 <- data.frame(y = y, id = id, method = method, time = time)

ccc_lmm_reml(dat_ar4,
             response = "y", rind = "id", method = "method", time = "time",
             ar = "ar1", ar_rho = 0.6, verbose = TRUE)
# }

# ====================================================================
# 6) Random slope variants (subject, method, custom Z)
# ====================================================================
# \donttest{
## By SUBJECT
set.seed(2)
n_subj <- 60; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
subj <- as.integer(id)
slope_i <- rnorm(n_subj, 0, 0.15)
slope_vec <- slope_i[subj]
base <- rnorm(n_subj, 0, 1.0)[subj]
tnum <- as.integer(tim)
y <- base + 0.3*(method=="B") + slope_vec*(tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_s <- data.frame(y, id, method, time = tim)
dat_s$t_num <- as.integer(dat_s$time)
dat_s$t_c   <- ave(dat_s$t_num, dat_s$id, FUN = function(v) v - mean(v))
ccc_lmm_reml(dat_s, "y", "id", method = "method", time = "time",
             slope = "subject", slope_var = "t_c", verbose = TRUE)

## By METHOD
set.seed(3)
n_subj <- 60; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
slope_m <- ifelse(method=="B", 0.25, 0.00)
base <- rnorm(n_subj, 0, 1.0)[as.integer(id)]
tnum <- as.integer(tim)
y <- base + 0.3*(method=="B") + slope_m*(tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_m <- data.frame(y, id, method, time = tim)
dat_m$t_num <- as.integer(dat_m$time)
dat_m$t_c   <- ave(dat_m$t_num, dat_m$id, FUN = function(v) v - mean(v))
ccc_lmm_reml(dat_m, "y", "id", method = "method", time = "time",
             slope = "method", slope_var = "t_c", verbose = TRUE)

## SUBJECT + METHOD random slopes (custom Z)
set.seed(4)
n_subj <- 50; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
subj <- as.integer(id)
slope_subj <- rnorm(n_subj, 0, 0.12)[subj]
slope_B    <- ifelse(method=="B", 0.18, 0.00)
tnum <- as.integer(tim)
base <- rnorm(n_subj, 0, 1.0)[subj]
y <- base + 0.3*(method=="B") +
     (slope_subj + slope_B) * (tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_bothRS <- data.frame(y, id, method, time = tim)
dat_bothRS$t_num <- as.integer(dat_bothRS$time)
dat_bothRS$t_c   <- ave(dat_bothRS$t_num, dat_bothRS$id, FUN = function(v) v - mean(v))
MM <- model.matrix(~ 0 + method, data = dat_bothRS)
Z_custom <- cbind(
  subj_slope = dat_bothRS$t_c,
  MM * dat_bothRS$t_c
)
ccc_lmm_reml(dat_bothRS, "y", "id", method = "method", time = "time",
             slope = "custom", slope_Z = Z_custom, verbose = TRUE)
# }

Run the code above in your browser using DataLab