Learn R Programming

⚠️There's a newer version (1.2.2) of this package.Take me there.

Overview

Thanks to the Ollama API that allows to use Large Language Model (LLM) locally, we developed a small package designed for interpreting continuous or categorical latent variables. You provide a data set with a latent variable you want to understand and some other explanatory variables. It provides a description of the latent variable based on the explanatory variables. It also provides a name to the latent variable. 'NaileR' in an R package that uses convenience functions offered by the 'FactoMineR' package (condes(), catdes(), descfreq()) in conjunction with the 'ollamar' package.

Its two main goals are to:

  • generate latent variables descriptions with the help of AI
  • offer similarity measure tools for textual data

Installation (from GitHub)

  1. If needed, install the devtools package.
install.packages('devtools')
  1. Install and load the 'NaileR' package from GitHub.
devtools::install_github('Nelhe/NaileR')
library(NaileR)

Usage

'NaileR' currently features 9 datasets and 7 functions.

Datasets

  • agri_studies: contains the results of a Q method-like survey on agribusiness studies
  • beard, beard_cont and beard_wide: contain the results of a sensometrics experiment on beards
  • boss: contains the results of a Q method-like survey on the ideal boss
  • glossophobia: contains the results of a Q method-like survey on feelings about speaking in public
  • local_food: contains the results of a Q method-like survey on sustainable food systems
  • quality: contains the results of a survey on French food certification logos
  • waste: contains the results of a survey on food waste

Functions

  • nail_catdes(): performs a catdes analysis on a dataset and describes each category
  • nail_condes(): performs a condes analysis on a dataset and describes the chosen continuous variable
  • nail_descfreq(): performs a descfreq analysis on a contingency table and describes the rows
  • sim_llm(): computes the similarity between texts
  • dist_mat_llm(): computes a distance matrix based on sim_llm
  • dist_ref_llm(): computes a distance vector based on sim_llm
  • nail_sort(): performs clustering on textual data from sensometrics experiments

Example

For complete case studies and a showcase of the main functions of the 'NaileR' package, see the documentation.

Let's have a look at how we can interpret HCPC clusters:

library(FactoMineR)
data(local_food)

set.seed(1)      # for consistency

res_mca <- MCA(local_food, quali.sup = 46:63, ncp = 100, level.ventil = 0.05, graph = F)
plot.MCA(res_mca, choix = "ind", invisible = c("var", "quali.sup"), label = "none")
res_hcpc <- HCPC(res_mca, nb.clust = 3, graph = F)
plot.HCPC(res_hcpc, choice = "map", draw.tree = F, ind.names = F)
don_clust <- res_hcpc$data.clust

Due to the very long and explicit variable names, the category description result is practically illegible. Let's provide clear context and see how a LLM can make sense of it:

res = nail_catdes(don_clust, ncol(don_clust),
                   
                   introduction = 'A study on sustainable food systems was led on several French participants. This study had 2 parts. 
                   In the first part, participants had to rate how acceptable "a food system that..." (e.g, "a food system that only uses renewable energy") was to them.
                   In the second part, they had to say if they agreed or disagreed with some statements.',
                   
                   request = 'I will give you the answers from one group.
                   Please explain who the individuals of this group are, what their beliefs are. Then, give this group a new name, and explain why you chose this name.',
                   
                   isolate.groups = T, drop.negative = T)

Out comes a list of results, for each group.

In the same fashion, nail_condes can be used to interpret axis from a PCA - although a bit more work is needed, to bind the original data frame with the coordinates on the PCA axis.

Roadmap

  • Implement a validation function to test the consistency of a response
  • Implement a function to generate multiple responses and pick the most "central"
  • Add a nail_textual nail_sort for textual data
  • Consider adding a nail_decat
  • Implement a way to generate reports (pptx)

License

This package is under the GPL (>= 2) License. Details can be found here.

Contact

Sébastien Lê - sebastien.le@institut-agro.fr

Project link: https://github.com/Nelhe/NaileR

Copy Link

Version

Install

install.packages('NaileR')

Monthly Downloads

196

Version

1.2.0

License

GPL (>= 2)

Maintainer

Sébastien Lê

Last Published

September 26th, 2024

Functions in NaileR (1.2.0)

nail_condes

Interpret a continuous latent variable
rorschach

Rorschach inkblots
nail_sort

Sort textual data
%>%

Pipe operator
quality

Perception of food quality
waste

Food waste survey
sim_llm

LLM text similarity
atomic_habit_clust

Atomic habits survey
dist_ref_llm

LLM response consistency
dist_mat_llm

LLM distance matrix
beard_wide

Beard descriptions
agri_studies

Agribusiness studies survey
boss

Ideal boss survey
atomic_habit

Atomic habits survey
beard_cont

Beard descriptions
nail_textual

Interpret a group based on answers to open-ended questions
car_alone

Atomic habits survey
nail_catdes

Interpret a categorical latent variable
local_food

Local food systems survey
glossophobia

Glossophobia survey
fabric

Car seat fabrics
nail_descfreq

Interpret the rows of a contingency table
nutriscore

Nutri-score survey
beard

Beard descriptions
nail_qda

Interpret QDA data