precision_study: Precision Study Analysis

Description

Performs variance component analysis for precision experiments following established methodology for clinical laboratory method validation. Estimates repeatability, Within-laboratory precision, and reproducibility from nested experimental designs.

Usage

precision_study(
  data,
  value = "value",
  sample = NULL,
  site = NULL,
  day = "day",
  run = NULL,
  replicate = NULL,
  conf_level = 0.95,
  ci_method = c("satterthwaite", "mls", "bootstrap"),
  boot_n = 1999,
  method = c("anova", "reml")
)

Value

An object of class c("precision_study", "valytics_precision", "valytics_result"), which is a list containing:

input

List with original data and metadata:

data: The input data frame (after validation)
n: Total number of observations
n_excluded: Number of observations excluded due to NAs
factors: Named list of factor column names used
value_col: Name of the value column

design

List describing the experimental design:

type: Design type (e.g., "single_site", "multi_site")
structure: Character string describing nesting (e.g., "day/run")
levels: Named list with number of levels for each factor
balanced: Logical; TRUE if design is balanced
n_samples: Number of distinct samples/concentration levels

variance_components

Data frame with variance component estimates:

component: Name of variance component
variance: Estimated variance
sd: Standard deviation (sqrt of variance
pct_total: Percentage of total variance
df: Degrees of freedom

precision

Data frame with precision estimates:

measure: Precision measure name (repeatability, intermediate, etc.)
sd: Standard deviation
cv_pct: Coefficient of variation (percent)
ci_lower: Lower confidence limit
ci_upper: Upper confidence limit

anova_table

ANOVA table with SS, MS, DF for each source of variation

by_sample

If multiple samples: list of results per sample

settings

List with analysis settings

call

The matched function call

Arguments

data: A data frame containing the precision experiment data.
value: Character string specifying the column name containing measurement values. Default is "value".
sample: Character string specifying the column name for sample/level identifier. Use when multiple concentration levels are tested. Default is NULL (single sample).
site: Character string specifying the column name for site/device identifier. Use for multi-site reproducibility studies. Default is NULL (single site).
day: Character string specifying the column name for day identifier. Default is "day".
run: Character string specifying the column name for run identifier (within day). Default is NULL (assumes single run per day).
replicate: Character string specifying the column name for replicate identifier. If NULL (default), replicates are inferred from the data structure.
conf_level: Confidence level for intervals (default: 0.95).
ci_method: Method for calculating confidence intervals: "satterthwaite" (default) uses the Satterthwaite approximation, "mls" uses the Modified Large Sample method, "bootstrap" uses BCa bootstrap resampling.
boot_n: Number of bootstrap resamples when ci_method = "bootstrap" (default: 1999).
method: Estimation method for variance components: "anova" (default) uses ANOVA-based method of moments, "reml" uses Restricted Maximum Likelihood (requires lme4 package).

Confidence Intervals

Three methods are available for confidence interval estimation:

Satterthwaite (default): Uses Satterthwaite's approximation for degrees of freedom of linear combinations of variance components.
MLS: Modified Large Sample method, which can provide better coverage when variance components may be estimated as negative.
Bootstrap: BCa bootstrap resampling. Most robust but computationally intensive.

ANOVA vs REML

ANOVA (default): Method of moments estimation. Works well for balanced designs. May produce negative variance estimates for small variance components (set to zero by default).
REML: Restricted Maximum Likelihood. Preferred for unbalanced designs. Requires the lme4 package. Always produces non-negative estimates.

Details

This function implements variance component analysis for nested experimental designs commonly used in clinical laboratory precision studies. The analysis follows methodology consistent with international standards.

Supported Experimental Designs:

Single-site, day/run/replicate: Classic 20 x 2 x 2 design (20 days, 2 runs per day, 2 replicates per run)
Single-site, day/replicate: Simplified design without run factor (e.g., 5 days x 5 replicates for verification)
Multi-site: 3 sites x 5 days x 5 replicates for reproducibility
Custom designs: Any fully-nested combination of factors

Variance Components:

For a design with site/day/run/replicate, the model is: $$y_{ijkl} = \mu + S_i + D_{j(i)} + R_{k(ij)} + \epsilon_{l(ijk)}$$

where S = site, D = day (nested in site), R = run (nested in day), and epsilon = residual error.

Precision Measures:

Repeatability: Within-run precision (sqrt of error variance)
Between-run precision: Additional variability between runs
Between-day precision: Additional variability between days
Within-laboratory precision: Within-laboratory precision (combines day, run, and error variance)
Reproducibility: Total precision including between-site variability (for multi-site designs)

References

Chesher D (2008). Evaluating assay precision. Clinical Biochemist Reviews, 29(Suppl 1):S23-S26.

ISO 5725-2:2019. Accuracy (trueness and precision) of measurement methods and results - Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method.

Searle SR, Casella G, McCulloch CE (1992). Variance Components. Wiley, New York.

Satterthwaite FE (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2:110-114.

Examples

Run this code

# Example with simulated precision data
set.seed(42)

# Generate study design: 20 days x 2 runs x 2 replicates
n_days <- 20
n_runs <- 2
n_reps <- 2

prec_data <- expand.grid(
  day = 1:n_days,
  run = 1:n_runs,
  replicate = 1:n_reps
)

# Add realistic variance components
day_effect <- rep(rnorm(n_days, 0, 1.5), each = n_runs * n_reps)
run_effect <- rep(rnorm(n_days * n_runs, 0, 1.0), each = n_reps)
error <- rnorm(nrow(prec_data), 0, 2.0)

prec_data$value <- 100 + day_effect + run_effect + error

# Run precision study
prec <- precision_study(
  data = prec_data,
  value = "value",
  day = "day",
  run = "run"
)

print(prec)
summary(prec)

Run the code above in your browser using DataLab