Compares two or more coded objects to assess inter-rater reliability or
agreement. For predefined-unit data (data frames or qlm_coded objects),
computes standard reliability statistics. For segmented corpora from
qlm_segment(), computes Krippendorff's alpha for unitizing (see Details).
qlm_compare(
...,
by,
level = NULL,
tolerance = 0,
ci = c("none", "analytic", "bootstrap"),
bootstrap_n = 1000
)A qlm_comparison object (a tibble/data frame) with the following columns:
variableName of the compared variable
levelMeasurement level used
measureName of the reliability metric
valueComputed value of the metric
docidSource document identifier and overall indicator (unitizing comparisons only). Absent for predefined-unit comparisons.
rater1, rater2, ...Names of the compared objects (one column per rater)
ci_lowerLower bound of confidence interval (only if ci != "none")
ci_upperUpper bound of confidence interval (only if ci != "none")
The object has class c("qlm_comparison", "tbl_df", "tbl", "data.frame") and
attributes containing metadata (raters, n, call).
Metrics by measurement level (predefined-unit comparisons):
Nominal: alpha_nominal, kappa (Cohen's/Fleiss'), percent_agreement
Ordinal: alpha_ordinal, kappa_weighted (2 raters only), w (Kendall's W), rho (Spearman's), percent_agreement
Interval/Ratio: alpha_interval/alpha_ratio, icc, r (Pearson's), percent_agreement
For unitizing measures (segmented corpora), see Details.
Confidence intervals:
ci = "analytic": Provides analytic CIs for ICC and Pearson's r only
ci = "bootstrap": Provides bootstrap CIs for all metrics via resampling
Two or more data frames, qlm_coded, or as_qlm_coded objects
to compare. These represent different "raters" (e.g., different LLM runs,
different models, human coders, or human vs. LLM coding). Each object must
have a .id column and the variable specified in by. Objects should have
the same units (matching .id values). Plain data frames are automatically
converted to as_qlm_coded objects. Alternatively, all inputs may be
segmented corpora from qlm_segment() or as_qlm_coded() with
qlm_segment = TRUE (see Details).
Optional. Name of the variable(s) to compare across raters (supports
both quoted and unquoted). If NULL (default), all coded variables are
compared. Can be a single variable (by = sentiment), a character vector
(by = c("sentiment", "rating")), or NULL to process all variables.
Optional. Measurement level(s) for the variable(s). Can be:
NULL (default): Auto-detect from codebook
Character scalar: Use same level for all variables
Named list: Specify level for each variable
Valid levels are "nominal", "ordinal", "interval", or "ratio".
Numeric. Tolerance for agreement with numeric data. Default is 0 (exact agreement required). Used for percent agreement calculation.
Confidence interval method:
"none"No confidence intervals (default)
"analytic"Analytic CIs where available (ICC, Pearson's r)
"bootstrap"Bootstrap CIs for all metrics via resampling
Number of bootstrap resamples when ci = "bootstrap".
Default is 1000. Ignored when ci is "none" or "analytic".
The function merges the coded objects by their .id column and only includes
units that are present in all objects. Missing values in any rater will
exclude that unit from analysis.
Measurement levels and statistics:
Nominal: For unordered categories. Computes Krippendorff's alpha, Cohen's/Fleiss' kappa, and percent agreement.
Ordinal: For ordered categories. Computes Krippendorff's alpha (ordinal), weighted kappa (2 raters only), Kendall's W, Spearman's rho, and percent agreement.
Interval: For continuous data with meaningful intervals. Computes Krippendorff's alpha (interval), ICC, Pearson's r, and percent agreement.
Ratio: For continuous data with a true zero point. Computes the same measures as interval level, but Krippendorff's alpha uses the ratio-level formula which accounts for proportional differences.
Kendall's W, ICC, and percent agreement are computed using all raters simultaneously. For 3 or more raters, Spearman's rho and Pearson's r are computed as the mean of all pairwise correlations between raters.
Unitizing (segmentation) reliability
When all inputs are segmented corpora — created by qlm_segment() or
as_qlm_coded() with qlm_segment = TRUE — agreement is measured at
the character level using Krippendorff's alpha for unitizing continua
(Krippendorff, 2019, section 12.6). This accounts for segments of
unequal length and partial overlaps between coders' unitizations. The
observed and expected coincidence matrices are constructed from the
lengths of pairwise segment intersections across all observer pairs.
The output includes a docid column with per-document and overall
results. Segmented corpora must reference the same source text.
Four members of the unitizing alpha family are supported:
alpha_u_binary (|_ualpha)Computed when by is omitted.
Measures agreement on which character spans are identified as segments
versus gaps (irrelevant matter). Collapses all segment values to a
binary distinction. Use this for pure boundary agreement when segments
carry no codes (section 12.6.4, eq. 35).
alpha_u_nominal (_ualpha[nominal])Computed when by
names a docvar. Measures agreement on both boundary placement and the
value (code) assigned to each segment. This is the most comprehensive
measure: low values can reflect boundary disagreement, coding
disagreement, or both (section 12.6.3, eq. 34).
alpha_cu_nominal (_cualpha[nominal])Computed alongside
alpha_u_nominal when by is specified. Measures coding agreement
conditional on unitization, restricting the coincidence matrix to
intersections of non-gap segments only. This isolates "do the coders
agree on the codes?" from "do they agree on the boundaries?"
(section 12.6.5, eqs. 36--37).
alpha_u_per_value[k] (_(k)ualpha[nominal])Computed
alongside alpha_u_nominal when by is specified. Reports the
reliability of each individual value k, showing which codes are
applied reliably and which are not. Coverage (the percentage of all
k-valued matter found in valued intersections) is reported in the
docid column (section 12.6.6, eq. 38).
Krippendorff, K. (2019). Content Analysis: An Introduction to Its Methodology (4th ed.). Sage. tools:::Rd_expr_doi("10.4135/9781071878781")
qlm_validate() for validation of coding against gold standards,
qlm_code() for LLM coding, as_qlm_coded() for human coding,
qlm_segment() for LLM-powered text segmentation.
# Load example coded objects
examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer"))
# Compare two coding runs
comparison <- qlm_compare(
examples$example_coded_sentiment,
examples$example_coded_mini,
by = "sentiment",
level = "nominal"
)
print(comparison)
# Compare specific variables with explicit levels
qlm_compare(
examples$example_coded_sentiment,
examples$example_coded_mini,
by = "sentiment"
)
Run the code above in your browser using DataLab