Learn R Programming

text

R Language Analysis Suite

An R-package for analyzing natural language with transformers-based large language models. The text package is part of the R Language Analysis Suite, including:

  • talk - a package that transforms voice recordings into text, audio features, or embeddings.
  • text - a package that provides tools for many language tasks such as converting digital text into word embeddings. talk and text offer access to Large Language Models from Hugging Face.
  • topics a package with tools for visualizing language patterns into topics.
  • the L-BAM Library a library that provides pre-trained models for different psychological assessments such as mental health issues, personality and related behaviours.

The R Language Analysis Suite is created through a collaboration between psychology and computer science to address research needs and ensure state-of-the-art techniques. The suite is continuously tested on Ubuntu, Mac OS and Windows using the latest stable R version.

The text-package has two main objectives: * First, to serve R-users as a point solution for transforming text to state-of-the-art word embeddings that are ready to be used for downstream tasks. The package provides a user-friendly link to language models based on transformers from Hugging Face. * Second, to serve as an end-to-end solution that provides state-of-the-art AI techniques tailored for social and behavioral scientists. Please reference our tutorial article when using the text package: The text-package: An R-package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning.

Copy Link

Version

Install

install.packages('text')

Monthly Downloads

1,051

Version

1.9

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Oscar Kjell

Last Published

June 13th, 2026

Functions in text (1.9)

textEmbedReduce

Pre-trained dimension reduction (experimental)
textFineTuneTask

Task Adapted Pre-Training (EXPERIMENTAL - under development)
textLBAM

The LBAM library
textGeneration

Text generation
textEmbedRawLayers

Extract layers of hidden states
textEmbedStatic

Apply static word embeddings
textFindNonASCII

Detect non-ASCII characters
textExamples

Identify language examples.
textModelLayers

Number of layers
textFineTuneDomain

Domain Adapted Pre-Training (EXPERIMENTAL - under development)
textPredict

textPredict, textAssess and textClassify
textModels

Check downloaded, available models.
textPCAPlot

textPCAPlot
textNER

Named Entity Recognition. (experimental)
textModelsRemove

Delete a specified model
textProjection

Supervised Dimension Projection
textPredictAll

Predict from several models, selecting the correct input
textPredictTest

Significance testing for model prediction performance
textPlot

Plot words
textPCA

textPCA()
textTokenizeAndCount

Tokenize and count
textSimilarity

Semantic Similarity
textSimilarityNorm

Semantic similarity between a text variable and a word norm
textTopicsReduce

textTopicsReduce (EXPERIMENTAL)
textTopics

BERTopic
textQA

Question Answering. (experimental)
textProjectionPlot

Plot Supervised Dimension Projection
textTokenize

Tokenize text-variables
textSum

Summarize texts. (experimental)
textSimilarityMatrix

Semantic similarity across multiple word embeddings
textTrainLists

Train lists of word embeddings
textTrainRandomForest

Trains word embeddings usig random forest
textTrainN

Cross-validated accuracies across sample-sizes
textTrainRegression

Train word embeddings to a numeric variable.
textTrainNPlot

Plot cross-validated accuracies across sample sizes
textTrain

Trains word embeddings
textTranslate

Translation. (experimental)
textTopicsWordcloud

Plot word clouds
textTopicsTest

Wrapper for topicsTest function from the topics package
textTopicsTree

textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree
textWordPrediction

Compute word-level prediction scores for plotting with textProjectionPlot().
textrpp_uninstall

Uninstall textrpp conda environment
textZeroShot

Zero Shot Classification (Experimental)
word_embeddings_4

Word embeddings for 4 text variables for 40 participants
textrpp_initialize

Initialize text required python packages
textrpp_install

Install text required python packages in conda or virtualenv environment
textCentralityPlot

Plots words from textCentrality()
textCentrality

Semantic similarity score between single words' and an aggregated word embeddings
centrality_data_harmony

Example data for plotting a Semantic Centrality Plot.
textClean

Cleans text from standard personal information
Language_based_assessment_data_8

Text and numeric data for 10 participants.
Language_based_assessment_data_3_100

Example text and numeric data.
DP_projections_HILS_SWLS_100

Data for plotting a Dot Product Projection Plot.
raw_embeddings_1

Word embeddings from textEmbedRawLayers function
find_textrpp_env

Find text required python packages env
PC_projections_satisfactionwords_40

Example data for plotting a Principle Component Projection Plot.
textDistance

Semantic distance
textCleanNonASCII

Clean non-ASCII characters
textDimName

Change dimension names
textEmbedLayerAggregation

Aggregate layers
textDomainCompare

Compare two language domains
textEmbed

textEmbed() extracts layers and aggregate them to word embeddings, for all character variables in a given dataframe.
textDescriptives

Compute descriptive statistics of character variables.
textDistanceNorm

Semantic distance between a text variable and a word norm
textDistanceMatrix

Semantic distance across multiple word embeddings
textDiagnostics

Run diagnostics for the text package