Learn R Programming

EnTraineR

An intelligent teaching assistant based on LLMs to help interpret statistical model outputs in R.
EnTraineR builds audience-aware prompts (beginner, applied, advanced) that never invent numbers: it passes verbatim outputs from R and instructs how to explain them.

Works out of the box to produce high-quality prompts.
Optionally, you can connect your own LLM backend (via your functions built on top of trainer_core_generate_or_return()).

Installation

From GitHub:

# install.packages("remotes")
remotes::install_github("Sebastien-Le/EnTraineR")

Optional but recommended packages for examples:

  • FactoMineR, SensoMineR (model objects used in examples)
  • stringr (to squish multi-line intros)

What it does

  • Generates clean prompts to interpret:
    • ANOVA summaries (AovSum) with F-tests and T-tests
    • Linear models (FactoMineR::LinearModel) including model selection notes
    • Classical tests: t-test, variance F-test, proportion test, correlation test, chi-squared test
  • Audience-aware guidance:
    • beginner: plain-language teaching focus
    • applied: decisions and practical implications
    • advanced: technical but concise, with appropriate cautions
  • No invented numbers: only uses the verbatim output you provide.
  • Gemini integration (optional):
    • gemini_generate() sends your prompt to Google Gemini (Generative Language API) and returns the text reply.

Included datasets

The package ships 3 small datasets for teaching:

  • deforestation
    Air and water temperatures before/after riparian deforestation.
    Variables: Temp_water, Temp_air, Deforestation (BEFORE/AFTER).

  • ham
    Sensory descriptors for 21 hams and an Overall liking score.
    Useful for multiple regression demonstrations.

  • poussin
    Chick weights by brooding Temperature (T1/T2/T3) and Gender (Female/Male).
    Useful for two-factor ANOVA examples.

These datasets are the intellectual property of L'Institut Agro Rennes Angers and are used for the "Statistical Approach" course module.

data(deforestation); str(deforestation)
data(ham); summary(ham)
data(poussin); with(poussin, table(Temperature, Gender))

Quick start

1) ANOVA (AovSum)

# install.packages("SensoMineR")
library(SensoMineR)
data(chocolates)

# Build AovSum (example similar to chocolates::Granular ~ Product*Panelist)
res <- AovSum(Granular ~ Product*Panelist, data = sensochoc)

intro <- "Six chocolates have been evaluated by a sensory panel, 
  according to a sensory attribute: granular.
  The panel has been trained according to this attribute
  and panellists should be reproducible when rating this attribute."
intro <- gsub("\n", " ", intro)
intro <- stringr::str_squish(intro)

p <- trainer_AovSum(
  aovsum_obj   = res,
  audience     = "applied",
  t_test       = c("Product", "Panelist"),  # filter T-test section
  introduction = intro
)

cat(p)   # a ready-to-use prompt for an LLM or for teaching

2) Linear model (FactoMineR::LinearModel)

# install.packages("FactoMineR"); install.packages("stringr")
library(FactoMineR)

intro_ham <- "Can we predict ham overall liking from its sensory profile?"
intro_ham <- stringr::str_squish(gsub("\n", " ", intro_ham))

fit <- LinearModel(`Overall liking` ~ ., data = ham, selection = "bic")

pr <- trainer_LinearModel(
  lm_obj       = fit,
  introduction = intro_ham,
  audience     = "advanced"
)

cat(pr)

Another linear model with interaction and a categorical factor:

fit2 <- LinearModel(Temp_water ~ Temp_air * Deforestation,
                    data = deforestation, selection = "none")

pr2 <- trainer_LinearModel(
  lm_obj       = fit2,
  introduction = "Effect of deforestation on the air-water temperature link.",
  audience     = "beginner"
)

cat(pr2)

3) Classical tests

t-test:

tt <- t.test(rnorm(20, 0.1), mu = 0)
cat(trainer_t_test(tt, audience = "beginner"))

Variance F-test:

vt <- var.test(rnorm(25, sd = 1.0), rnorm(30, sd = 1.3))
cat(trainer_var_test(vt, audience = "applied"))

Proportion test:

pt <- prop.test(x = c(42, 35), n = c(100, 90))
cat(trainer_prop_test(pt, audience = "advanced", summary_only = TRUE))

Correlation test:

set.seed(1)
x <- rnorm(30); y <- 0.5 * x + rnorm(30, sd = 0.8)
ct <- cor.test(x, y, method = "pearson")
cat(trainer_cor_test(ct, audience = "applied"))

Chi-squared test:

m <- matrix(c(10, 20, 30, 40), nrow = 2)
cx <- chisq.test(m, correct = TRUE)
cat(trainer_chisq_test(cx, audience = "beginner"))

Using Gemini from R (optional)

gemini_generate() lets you send a prompt to Google Gemini and get the response back as text.

# 1) Set your API key once per session (or in .Renviron)
Sys.setenv(GEMINI_API_KEY = "your_key_here")

# 2) Send a prompt
txt <- gemini_generate(
  prompt      = "Say hello in one short sentence.",
  model       = "gemini-2.5-flash",   # accepts "gemini-2.5-flash" or "models/gemini-2.5-flash"
  temperature = 0.2,
  user_agent  = "EnTraineR/0.9.0 (https://github.com/Sebastien-Le/EnTraineR)"
)
cat(txt)

Audience profiles (summary)

  • beginner: plain English, define what is tested, minimal jargon, short sentences.
  • applied: decisions and implications, keep it practical, concise takeaways.
  • advanced: technical but clear, mention df/F where relevant, caution about multiplicity/assumptions without inventing diagnostics.

All prompts emphasize: do not invent numbers; use only what appears in the printed output.

Reproducibility and LLMs

By default, trainers return a prompt string (i.e., generate = FALSE).
If you have a generator backend, you can pass generate = TRUE and a llm_model name; implement your own trainer_core_generate_or_return() to call your LLM API.

Contributing

Issues and pull requests are welcome. Please:

  • Keep code ASCII and roxygen2-ready.
  • Add tests and examples where relevant.
  • Follow the audience style guidelines.

License and citation

See the DESCRIPTION file for license terms.
If EnTraineR helps your teaching or analyses, starring the repo is appreciated.

Acknowledgments

Thanks to the R community and the authors of FactoMineR and SensoMineR for inspiring teaching tools and example datasets used in demonstrations.

Copy Link

Version

Install

install.packages('EnTraineR')

Version

1.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sébastien Lê

Last Published

January 17th, 2026

Functions in EnTraineR (1.0.0)

trainer_core_filter_ttest_by_factors

Filter T-test lines by requested factors (main and/or interactions)
trainer_core_generate_or_return

Generate or return a prompt, depending on `generate`
trainer_core_detect_main_factors

Detect main-effect factor names present in T-test lines (ignore interactions) Space-safe: captures everything before " - " on non-interaction rows.
trainer_var_test

Interpret an F test comparing two variances (var.test) with an audience-aware LLM prompt
trainer_core_extract_block_after

Extract lines following a header (up to first blank line)
trainer_t_test

Interpret a Student's t-test (stats::t.test) with an LLM-ready prompt
trainer_prop_test

Interpret a proportion test (prop.test) with an audience-aware LLM prompt
trainer_core_summary_only_block

Utility: render a standard 3-bullet summary-only instruction
trainer_core_ttest_scope_msg

Scope message for T-test section based on requested & found factors
trainer_MCA

Trainer: Name an MCA dimension (FactoMineR::MCA) with an LLM-ready prompt
trainer_PCA

Trainer: Name a PCA dimension (FactoMineR::PCA) with an LLM-ready prompt
poussin

Poussin: weight by brooding temperature and sex
trainer_LinearModel

Trainer: Interpret FactoMineR::LinearModel with an LLM-ready prompt
trainer_AovSum

Trainer: Interpret ANOVA (AovSum) with an LLM-ready prompt
trainer_cor_test

Interpret a correlation test (cor.test) with an audience-aware LLM prompt
gemini_generate

Generate text with Google Gemini (Generative Language API) - robust w/ retries
trainer_chisq_test

Interpret a chi-squared test (chisq.test) with an audience-aware LLM prompt
trainer_core_conf_label

Confidence level label helper
trainer_core_build_prompt

Assemble a standard prompt with common sections
trainer_core_audience_profile

Build an audience profile (beginner / applied / advanced) with optional summary-only mode
trainer_core_actually_shown

Determine which requested items were actually shown after filtering
ham

Ham: sensory descriptors and overall liking
trainer_core_prompt_header

Build the standard header for prompts
trainer_core_llm_generate

LLM generation helper for TraineR
deforestation

River deforestation: air and water temperatures before/after