generate_parallel: Generate Text in Parallel for Multiple Prompts

Description

Generate Text in Parallel for Multiple Prompts

Usage

generate_parallel(
  context,
  prompts,
  max_tokens = 100L,
  top_k = 40L,
  top_p = 1,
  temperature = 0,
  repeat_last_n = 0L,
  penalty_repeat = 1,
  seed = 1234L,
  progress = interactive(),
  verbosity = 0L,
  clean = FALSE,
  hash = TRUE
)

Value

Character vector of generated texts

Arguments

context: A context object created with context_create
prompts: Character vector of input text prompts
max_tokens: Maximum number of tokens to generate (default: 100)
top_k: Top-k sampling parameter (default: 40). Limits vocabulary to k most likely tokens
top_p: Top-p (nucleus) sampling (default: 1.0). Probability threshold for token selection.
temperature: Sampling temperature (default: 0.0). Set to 0 for greedy decoding. Higher values increase creativity
repeat_last_n: Number of recent tokens to consider for repetition penalty (default: 0). Set to 0 to disable
penalty_repeat: Repetition penalty strength (default: 1.0). Values >1 discourage repetition. Set to 1.0 to disable
seed: Random seed for reproducible generation (default: 1234). Use positive integers for deterministic output
progress: Show a console progress bar while batches run. Defaults to interactive(): visible in interactive sessions, suppressed in scripts and R CMD check.
verbosity: Control backend logging during generation (default: 0L). Larger numbers print more detail: 0 shows only errors, 1 adds warnings, 2 prints informational messages, and 3 enables the most verbose debug output. Negative values fully suppress backend output. Defaults to quiet (0) so that only the progress bar is visible during typical batch runs, matching generate. This differs from model_load and context_create (default 1L), which run once per session and benefit from warnings being visible. Raise to 2L or 3L when debugging llama.cpp internals.
clean: If TRUE, remove common chat-template control tokens from each generated text (default: FALSE).
hash: When `TRUE` (default), computes SHA-256 hashes for the supplied prompts and generated outputs. Hashes are attached via the `"hashes"` attribute for later inspection.

Details

When more prompts are supplied than the context can hold in parallel (`n_seq_max - 1`), the function automatically processes them in sequential batches while preserving the original ordering of results.