qlm_code: Code qualitative data with an LLM

Description

Applies a codebook to input data using a large language model, returning a rich object that includes the codebook, execution settings, results, and metadata for reproducibility.

Usage

qlm_code(x, codebook, model, ..., batch = FALSE, name = NULL, notes = NULL)

Value

A qlm_coded object (a tibble with additional attributes):

Data columns: The coded results with a .id column for identifiers.
Attributes: data, input_type, and run (list containing name, batch, call, codebook, chat_args, execution_args, metadata, parent).

The object prints as a tibble and can be used directly in data manipulation workflows. The batch flag in the run attribute indicates whether batch processing was used. The execution_args contains all non-chat execution arguments (for either parallel or batch processing).

Arguments

x: Input data: a character vector of texts (for text codebooks) or file paths to images (for image codebooks). Named vectors will use names as identifiers in the output; unnamed vectors will use sequential integers.
codebook: A codebook object created with qlm_codebook(). Also accepts deprecated task() objects for backward compatibility.
model: Provider (and optionally model) name in the form "provider/model" or "provider" (which will use the default model for that provider). Passed to the name argument of ellmer::chat(). Examples: "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet-20241022", "ollama/llama3.2", "openai" (uses default OpenAI model).
...: Additional arguments passed to ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured(). Arguments recognized by ellmer::parallel_chat_structured() or ellmer::batch_chat_structured() are routed there; all other arguments (including provider-specific arguments like base_url, credentials, or api_args for OpenAI-compatible endpoints) are passed to ellmer::chat().
batch: Logical. If TRUE, uses ellmer::batch_chat_structured() instead of ellmer::parallel_chat_structured(). Batch processing is more cost-effective for large jobs but may have longer turnaround times. Default is FALSE. See ellmer::batch_chat_structured() for details.
name: Character string identifying this coding run. Default is NULL.
notes: Optional character string with descriptive notes about this coding run. Useful for documenting the purpose or rationale when viewing results in qlm_trail(). Default is NULL.

Details

Arguments in ... are dynamically routed to either ellmer::chat(), ellmer::parallel_chat_structured(), or ellmer::batch_chat_structured() based on their names.

Progress indicators and error handling are provided by the underlying ellmer::parallel_chat_structured() or ellmer::batch_chat_structured() function. Set verbose = TRUE to see progress messages during coding. Retry logic for API failures should be configured through ellmer's options.

When batch = TRUE, the function uses ellmer::batch_chat_structured() which submits jobs to the provider's batch API. This is typically more cost-effective but has longer turnaround times. The path argument specifies where batch results are cached, wait controls whether to wait for completion, and ignore_hash can force reprocessing of cached results.

Examples

Run this code

# Requires API credentials and internet access; not run in package checks.
if (FALSE) {
# Basic sentiment analysis
texts <- c("I love this product!", "Terrible experience.", "It's okay.")
coded <- qlm_code(texts, data_codebook_sentiment, model = "openai/gpt-4o-mini")
coded

# With named inputs (names become IDs in output)
texts_named <- c(review1 = "Great service!", review2 = "Very disappointing.")
coded2 <- qlm_code(texts_named, data_codebook_sentiment, model = "openai/gpt-4o-mini")
coded2
}