agent_evals

<p>Testing infrastructure for LLM-powered code. Provides testthat integration
with custom expectations for evaluating AI agent performance, tool accuracy,
and hallucination rates.</p>

A production-grade AI toolkit for R featuring a layered
architecture (Specification, Utilities, Providers, Core), request
interception support, robust error handling with exponential retry
delays, support for multiple AI model providers ('OpenAI',
'Anthropic', etc.), local small language model inference,
distributed 'MCP' ecosystem, multi-agent orchestration, progressive
knowledge loading through skills, and a global skill store for
sharing AI capabilities.

Yonghe Xia

agent_evals: Performance & Benchmarking: Agent Evals

Description

Arguments