trail_compare(
data,
text_col,
task,
settings,
id_col = NULL,
label_col = "label",
cache_dir = NULL,
overwrite = FALSE,
annotate_fun = annotate,
min_coders = 2L
)A trail_compare object with components:
Named list of trail_record objects (one per setting)
Wide coder-style annotation matrix (settings = columns)
Named list of inter-rater reliability statistics
Metadata on settings, identifiers, task, timestamp, etc.
A data frame containing the text to be annotated.
Character scalar. Name of the text column containing text units to annotate.
A quallmer task object describing what to extract or label.
A named list of trail_setting objects. The list
names serve as identifiers for each setting (similar to coder IDs).
Optional character scalar identifying the unit column.
If NULL, a consistent temporary ID (".trail_unit_id") is created
and added to the input data so annotations from all settings can be
aligned.
Character scalar. Name of the label column in each
record's annotations data that should be used as the code for
comparison (e.g. "label", "score", "category").
Optional character scalar specifying a directory to
cache LLM outputs. Passed to trail_record(). If NULL, caching disabled.
For examples and tests, use tempdir() to comply with CRAN policies.
Logical. If TRUE, ignore all cached results and
recompute annotations for every setting.
Annotation backend function used by
trail_record().
Minimum number of non-missing coders per unit required for inclusion in the inter-rater reliability calculation.
trail_compare() is deprecated. Use qlm_replicate() to re-run coding with
different models or settings, then use qlm_compare() to assess inter-rater
reliability.
All settings are applied to the same text units. Because the ID
column is shared across settings, their annotation outputs can be
directly compared via the matrix component, and summarized using
inter-rater reliability statistics in icr.
trail_record() – run a task for a single setting
trail_matrix() – align records into coder-style wide format
trail_icr() – compute inter-rater reliability across settings