lmm: Fitting Linear Mixed-effects Models

Description

lmm is used to fit linear mixed-effects models (LMM) based on summary-level data. The LMM parameters are estimated by either restricted maximum likelihood (REML) or maximum likelihood (ML) method with Fisher scoring (FS) gradient descent algorithm.

Usage

lmm(
  XX,
  XY,
  ZX,
  ZY,
  ZZ,
  Ynorm,
  n,
  d = ncol(ZZ),
  theta0 = NULL,
  method = c("REML", "ML"),
  max.iter = 50,
  epsilon = 1e-05,
  output.cov = TRUE,
  output.RE = FALSE
)

Value

A list containing the following components:

dlogL: First partial derivatives of log-likelihoods for each feature.
logLik: Maximum log-likelihoods for ML method or maximum log-restricted-likelihood for REML method.
niter: Numbers of iterations for each feature.
coef: A matrix of estimated coefficients (fixed effects), each column corresponds to a feature and each row one covariate.
se: A matrix of standard errors of the estimated coefficients.
t: A matrix of t-values for the fixed effects, equal to coef/se.
df: Degrees of freedom for the t-statistics (values).
p: A matrix of two-sided p-values for the t-tests of the fixed effects.
cov: A array of covariance matrices of the estimated coefficients (fixed effects).
theta: A matrix of the estimated variance components, each column corresponds to a feature and each row one variance component. The last row is the variance component of the residual error.
se.theta: Standard errors of the estimated theta.
RE: A matrix of the best linear unbiased prediction (BLUP) of random effects.

Arguments

XX: = t(X)%*%X, where X is a design matrix for fixed effects.
XY: = t(Y%*%X), where Y is a features-by-samples matrix of observed responses (genes-by-cells expression matrix for scRNA-seq).
ZX: = t(Z)%*%X, where Z = [Z1, ..., Zk], a design matrix for k random factors (variables or random components).
ZY: = t(Y%*%Z).
ZZ: = t(Z)%*%Z.
Ynorm: = rowSums(Y*Y), norms for features (each row in Y).
n: = nrow(X), number of samples (cells in scRNA-seq).
d: = (d1,...,dk), where di = ncol(Zi), number of columns in Zi. sum(d) = ncol(Z), number of columns in Z. For the model with only one random factor, d = ncol(Z).
theta0: A vector of initial values of the variance components, (s1, ...,sk, s_(k+1)), si = sigma_i^2, the variance component of the i-th type random effects. s_(k+1) = sigma^2, the variance component of model residual error.
method: Either REML or ML with Fisher scoring (FS) iterative algorithm.
max.iter: The maximal number of iterations for the iterative algorithm.
epsilon: Positive convergence tolerance. If the absolute value of the first partial derivative of log likelihood is less than epsilon, the iterations converge.
output.cov: If TRUE, output the covariance matrices for the estimated coefficients, which are needed for testing contrasts.
output.RE: If TRUE, output the best linear unbiased prediction (BLUP) of the random effects.

Examples

Run this code

#Generate data: X, Y, and Z.
set.seed(2024)

n <- 1e3
m <- 10
Y <- matrix(rnorm(m*n), m, n)
rownames(Y) <- paste0("Gene", 1:nrow(Y))

trt <- sample(c("A", "B"), n, replace = TRUE)
X <- model.matrix(~ 0 + trt)

q <- 20
sam <- rep(NA, n)
sam[trt == "A"] <- paste0("A", sample.int(q/2, sum(trt == "A"), replace = TRUE))
sam[trt == "B"] <- paste0("B", sample.int(q/2, sum(trt == "B"), replace = TRUE))
Z <- model.matrix(~ 0 + sam)
d <- ncol(Z)

#Fit LMM by summary-level data
#Compute and store the summary-level data:
n <- nrow(X)
XX <- t(X)%*%X
XY <- t(Y%*%X)
ZX <- t(Z)%*%X
ZY <- t(Y%*%Z)
ZZ <- t(Z)%*%Z
Ynorm <- rowSums(Y*Y)
fit <- lmm(XX, XY, ZX, ZY, ZZ, Ynorm = Ynorm, n = n, d = d)
str(fit)

Run the code above in your browser using DataLab