Learn R Programming

vitals

vitals is a framework for large language model evaluation in R. It’s specifically aimed at ellmer users who want to measure the effectiveness of their LLM products like custom chat apps and querychat apps. You can use it to:

  • Measure whether changes in your prompts or additions of new tools improve performance in your LLM product
  • Compare how different models affect performance, cost, and/or latency of your LLM product
  • Surface problematic behaviors in your LLM product

The package is an R port of the widely adopted Python framework Inspect. While the package doesn’t integrate with Inspect directly, it allows users to interface with the Inspect log viewer and provides an on-ramp to transition to Inspect if need be by writing evaluation logs to the same file format.

Installation

Install the vitals package from CRAN with:

install.packages("vitals")

You can install the developmental version of vitals using:

pak::pak("tidyverse/vitals")

Example

LLM evaluation with vitals is composed of two main steps.

library(vitals)
library(ellmer)
library(tibble)
  1. First, create an evaluation task with the Task$new() method.
simple_addition <- tibble(
  input = c("What's 2+2?", "What's 2+3?", "What's 2+4?"),
  target = c("4", "5", "6")
)

tsk <- Task$new(
  dataset = simple_addition, 
  solver = generate(chat_anthropic(model = "claude-sonnet-4-20250514")), 
  scorer = model_graded_qa()
)

Tasks are composed of three main components:

  • Datasets are a data frame with, minimally, columns input and target. input represents some question or problem, and target gives the target response.
  • Solvers are functions that take input and return some value approximating target, likely wrapping ellmer chats. generate() is the simplest scorer in vitals, and just passes the input to the chat’s $chat() method, returning its result as-is.
  • Scorers juxtapose the solvers’ output with target, evaluating how well the solver solved the input.
  1. Evaluate the task.
tsk$eval()

$eval() will run the solver, run the scorer, and then situate the results in a persistent log file that can be explored interactively with the Inspect log viewer.

Any arguments to the solver or scorer can be passed to $eval(), allowing for straightforward parameterization of tasks. For example, if I wanted to evaluate chat_openai() on this task rather than chat_anthropic(), I could write:

tsk_openai <- tsk$clone()
tsk_openai$eval(solver_chat = chat_openai(model = "gpt-4.1"))

For an applied example, see the “Getting started with vitals” vignette at vignette("vitals", package = "vitals").

Copy Link

Version

Install

install.packages('vitals')

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Simon Couch

Last Published

June 24th, 2025

Functions in vitals (0.1.0)

vitals_bundle

Prepare logs for deployment
generate

Convert a chat to a solver function
vitals_log_dir

The log directory
Task

Creating and evaluating tasks
vitals_view

Interactively view local evaluation logs
scorer_model

Model-based scoring
scorer_detect

Scoring with string detection
are

An R Eval
vitals_bind

Concatenate task samples for analysis
vitals-package

vitals: Large Language Model Evaluation