Learn R Programming

bertopicr

Topic modeling in R via reticulate + the Python BERTopic ecosystem (version 0.17.x). Provides helpers for training, persistence, topic inspection, and visualization; see the Quarto notebook and the vignettes for an end-to-end workflow.

Installation (R package)

install.packages("devtools")
devtools::install_github("tpetric7/bertopicr")

Python environment setup (pick one)

A. Install inside R via reticulate

Requires Python installed and discoverable by the R package reticulate. Install Python from python.org and restart R on Windows.

Installation with the setup_python_environment() function:

library(bertopicr)
library(reticulate)

setup_python_environment(
  envname = "r-bertopic",
  method = "virtualenv" # or "conda"
)

# Point reticulate at the environment you just created
use_virtualenv("r-bertopic", required = TRUE)
# or use_condaenv("r-bertopic", required = TRUE)
py_config()  # confirm reticulate sees the chosen env

Alternatively, setup with the following lines of code:

library(reticulate)
# Choose ONE of these depending on what you created
target_env <- "r-bertopic"
use_virtualenv(target_env, required = TRUE)      # for virtualenv
# use_condaenv(target_env, required = TRUE)      # for conda

req <- system.file("requirements.txt", package = "bertopicr")
# If req is "", reinstall/upgrade the package so the file is available.
py_install(packages = c("-r", req), envname = target_env, method = "auto", pip = TRUE)
py_config()  # confirm reticulate sees the chosen env

B. Virtualenv (base Python)

python -m venv r-bertopic

# Windows
r-bertopic\Scripts\activate

# macOS/Linux
source r-bertopic/bin/activate

pip install --upgrade pip
pip install -r inst/requirements.txt

C. Conda

conda create -n r-bertopic python=3.10
conda activate r-bertopic
pip install -r inst/requirements.txt

(Requirements are bundled at inst/requirements.txt. If you have a GPU, install a matching CUDA build of PyTorch in the same env, e.g. pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.)

macOS notes

If reticulate fails to load Python libraries on macOS, install Homebrew zlib and set the fallback library path once per session:

bertopicr::configure_macos_homebrew_zlib()

You can install zlib with Homebrew:

brew install zlib

Quick Start (fit + visualize)

The package includes helpers for setup, training, and persistence. You can still use your own BERTopic training code, then pass the Python model and outputs into the R helpers.

library(reticulate)
library(bertopicr)

# Point reticulate to the env you prepared
use_virtualenv("r-bertopic", required = TRUE)
# use_condaenv("r-bertopic", required = TRUE)

# Example: train in R (use a real sample to avoid tiny-N failures)
sample_path <- system.file("extdata", "spiegel_sample.rds", package = "bertopicr")
df <- readr::read_rds(sample_path)
texts <- df$text_clean[seq_len(500)]
topic_model <- train_bertopic_model(
  texts,
  embedding_model = "Qwen/Qwen3-Embedding-0.6B",
  top_n_words = 3L
)
# Note: tiny datasets can trigger UMAP spectral warnings/errors; using a
# realistic sample size and a smaller top_n_words avoids that.
save_bertopic_model(topic_model, "topic_model")

loaded <- load_bertopic_model("topic_model")
model <- loaded$model
probs <- loaded$extras$probabilities

# Use the R helpers
visualize_topics(model, filename = "intertopic_distance_map", auto_open = FALSE)
visualize_distribution(model, text_id = 1, probabilities = probs, auto_open = FALSE)

Advanced example

See the vignettes (including train_and_save_model.Rmd and load_and_reuse_model.Rmd) or the Quarto tutorial for a complete workflow (training, representation models [keyBERT, ollama models, ...], dimensionality reduction, clustering, and visualizations).

Scripts

The demo script is available at inst/scripts/train_model_function_demo.R and shows end-to-end training, saving, loading, and reuse.

Sample visualizations

Citation

BERTopic is described in:

@article{grootendorst2022bertopic,
  title={BERTopic: Neural topic modeling with a class-based TF-IDF procedure},
  author={Grootendorst, Maarten},
  journal={arXiv preprint arXiv:2203.05794},
  year={2022}
}

License

This package is licensed under the MIT License. You are free to use, modify, and distribute this software, provided that proper attribution is given to the original author.

Copy Link

Version

Install

install.packages('bertopicr')

Version

0.3.6

License

MIT + file LICENSE

Maintainer

Teodor Petrič

Last Published

January 22nd, 2026

Functions in bertopicr (0.3.6)

visualize_distribution

Visualize Topic Distribution for a Specific Document using BERTopic
visualize_barchart

Visualize BERTopic Bar Chart
visualize_topics_over_time

Visualize Topics Over Time using BERTopic
visualize_topics

Visualize Topics using BERTopic
visualize_topics_per_class

Visualize Topics per Class
get_representative_docs_custom

Get Representative Documents for a Specific Topic
get_topic_info_df

Get Topic Information DataFrame
get_document_info_df

Get Document Information DataFrame
save_bertopic_model

Save a BERTopic Model Bundle
find_topics_df

Find Topics DataFrame Function
load_bertopic_model

Load a BERTopic Model Bundle
get_most_representative_docs

Get Most Representative Documents for a Specific Topic
configure_macos_homebrew_zlib

Configure Homebrew zlib on macOS
setup_python_environment

Set Up Python Environment for BERTopic
visualize_documents

Visualize Documents in Reduced Embedding Space
get_topics_df

Get Topics DataFrame Function
visualize_heatmap

Visualize Topic Similarity Heatmap using BERTopic
visualize_documents_3d

Visualize Documents in 3D Space using BERTopic
visualize_hierarchy

Visualize Topic Hierarchy Nodes using BERTopic
train_bertopic_model

Train a BERTopic Model
get_topic_df

Get Topic DataFrame Function
visualize_documents_2d

Visualize Documents in 2D Space using BERTopic