Learn R Programming

psychtm: A package for text mining in psychological research

The goal of psychtm is to make text mining models and methods accessible for social science researchers, particularly within psychology. This package allows users to

  • Estimate the SLDAX topic model and popular models subsumed by SLDAX, including SLDA, LDA, and regression models;

  • Obtain posterior inferences;

  • Assess model fit using coherence and exclusivity metrics.

Installation

Once on CRAN, install the package as usual:

install.packages("psychtm")

Alternatively, you can install the most current development version:

  • If necessary, first install the devtools R package,
install.packages("devtools")

Option 1: Install the latest stable version from Github

devtools::install_github("ktw5691/psychtm")

Option 2: Install the latest development snapshot

devtools::install_github("ktw5691/psychtm@devel")

Example

This is a basic example which shows you how to (1) prepare text documents stored in a data frame; (2) fit a supervised topic model with covariates (SLDAX); and (3) summarize the regression relationships from the estimated SLDAX model.

library(psychtm)
library(lda) # Required if using `prep_docs()`

data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
fit_sldax <- gibbs_sldax(rating ~ I(grade - 1),
                         data = teacher_rate,
                         docs = docs_vocab$documents,
                         V = vocab_len,
                         K = 2,
                         model = "sldax")
eta_post <- post_regression(fit_sldax)
summary(eta_post)
#> 
#> Iterations = 1:100
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 100 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>                 Mean       SD  Naive SE Time-series SE
#> I(grade - 1) -0.2656 0.007307 0.0007307      0.0007307
#> topic1        4.6165 0.122216 0.0122216      0.0804883
#> topic2        4.8189 0.034301 0.0034301      0.0034301
#> effect_t1    -0.2024 0.134106 0.0134106      0.0884898
#> effect_t2     0.2024 0.134106 0.0134106      0.0884898
#> sigma2        1.1422 0.028296 0.0028296      0.0028296
#> 
#> 2. Quantiles for each variable:
#> 
#>                  2.5%     25%     50%     75%    97.5%
#> I(grade - 1) -0.27849 -0.2711 -0.2659 -0.2601 -0.25175
#> topic1        4.34365  4.5709  4.6584  4.6945  4.76228
#> topic2        4.75032  4.7994  4.8181  4.8420  4.87593
#> effect_t1    -0.51412 -0.2639 -0.1828 -0.1086 -0.01216
#> effect_t2     0.01216  0.1086  0.1828  0.2639  0.51412
#> sigma2        1.08793  1.1245  1.1445  1.1599  1.20649

For a more detailed example of the key functionality of this package, explore the vignette(s) for a good starting point:

browseVignettes("psychtm")

How to Cite the Package

Wilcox, K. T., Jacobucci, R., Zhang, Z., Ammerman, B. A. (2021). Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv. https://doi.org/10.31234/osf.io/62tc3

Common Troubleshooting

Ensure that appropriate C++ compilers are installed on your computer:

  • Mac users will have to download Xcode and its related Command Line Tools (found within Xcode’s Preference Pane under Downloads/Components).

  • Windows users may need to install Rtools. For easier command line use, be sure to select the option to install Rtools to their path.

  • Most Linux distributions should already have up-to-date compilers.

Limitations

  • This package uses a Gibbs sampling algorithm that can be memory-intensive for a large corpus.

Getting Help

If you think you have found a bug, please open an issue and provide a minimal complete verifiable example.

Copy Link

Version

Install

install.packages('psychtm')

Monthly Downloads

47

Version

2021.1.0

License

LGPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Kenneth Wilcox

Last Published

November 2nd, 2021

Functions in psychtm (2021.1.0)

b0<-

Create generic b0<- function for class
Model-class

An S4 super class to represent a regression-like model
alpha

Create generic alpha function for class
b0

Create generic b0 function for class
alpha<-

Create generic alpha<- function for class
a0<-

Create generic a0<- function for class
Mlr-class

Sldax-class

Logistic-class

eta<-

Create generic eta<- function for class
gamma_<-

Create generic gamma_<- function for class
logpost

Create generic logpost function for class
lpd<-

Create generic lpd<- function for class
proposal_sd

Create generic proposal_sd function for class
proposal_sd<-

Create generic proposal_sd<- function for class
nvocab<-

Create generic nvocab<- function for class
eta_start

Create generic eta_start function for class
gibbs_mlr

Fit linear regression model
eta_start<-

Create generic eta_start<- function for class
gibbs_logistic

Fit logistic regression model
ntopics

Create generic ntopics function for class
a0

Create generic a0 function for class
eta

Create generic eta function for class
mu0

Create generic mu0 function for class
p_eff

Create generic p_eff function for class
loglike

Create generic loglike function for class
gamma_

Create generic gamma_ function for class
nchain

Create generic nchain function for class
se_waic

Create generic se_waic function for class
sldax-summary

extra<-

Create generic extra<- function for class
extra

Create generic extra function for class
topics<-

Create generic topics<- function for class
p_eff<-

Create generic p_eff<- function for class
beta_<-

Create generic beta_<- function for class
prep_docs

Prepare documents in a data frame for modeling
logpost<-

Create generic logpost<- function for class
theta<-

Create generic theta<- function for class
teacher_rate

Synthetic (fake) student ratings of instructor quality.
beta_

Create generic beta_ function for class
ndocs<-

Create generic ndocs<- function for class
lpd

Create generic lpd function for class
nvocab

Create generic nvocab function for class
mu0<-

Create generic mu0<- function for class
sigma0

Create generic sigma0 function for class
topics

Create generic topics function for class
sigma0<-

Create generic sigma0<- function for class
waic<-

Create generic waic<- function for class
term_score

Compute term-scores for each word-topic pair
nchain<-

Create generic nchain<- function for class
gibbs_sldax

Fit supervised or unsupervised topic models (SLDAX or LDA)
ndocs

Create generic ndocs function for class
psychtm

psychtm: A package for text mining methods for psychological research
loglike<-

Create generic loglike<- function for class
ntopics<-

Create generic ntopics<- function for class
se_waic<-

Create generic se_waic<- function for class
waic

Create generic waic function for class
theta

Create generic theta function for class
waic_d

WAIC for observation y_d
waic_all

Compute WAIC for all outcomes.
sigma2

Create generic sigma2 function for class
sigma2<-

Create generic sigma2<- function for class
waic_diff

Compute difference (WAIC1 - WAIC2) in WAIC and its SE for two models.