psychtm: A package for text mining in psychological research
The goal of psychtm
is to make text mining models and methods
accessible for social science researchers, particularly within
psychology. This package allows users to
Estimate the SLDAX topic model and popular models subsumed by SLDAX, including SLDA, LDA, and regression models;
Obtain posterior inferences;
Assess model fit using coherence and exclusivity metrics.
Installation
Once on CRAN, install the package as usual:
install.packages("psychtm")
Alternatively, you can install the most current development version:
- If necessary, first install the
devtools
R package,
install.packages("devtools")
Option 1: Install the latest stable version from Github
devtools::install_github("ktw5691/psychtm")
Option 2: Install the latest development snapshot
devtools::install_github("ktw5691/psychtm@devel")
Example
This is a basic example which shows you how to (1) prepare text documents stored in a data frame; (2) fit a supervised topic model with covariates (SLDAX); and (3) summarize the regression relationships from the estimated SLDAX model.
library(psychtm)
library(lda) # Required if using `prep_docs()`
data(teacher_rate) # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
fit_sldax <- gibbs_sldax(rating ~ I(grade - 1),
data = teacher_rate,
docs = docs_vocab$documents,
V = vocab_len,
K = 2,
model = "sldax")
eta_post <- post_regression(fit_sldax)
summary(eta_post)
#>
#> Iterations = 1:100
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 100
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> I(grade - 1) -0.2656 0.007307 0.0007307 0.0007307
#> topic1 4.6165 0.122216 0.0122216 0.0804883
#> topic2 4.8189 0.034301 0.0034301 0.0034301
#> effect_t1 -0.2024 0.134106 0.0134106 0.0884898
#> effect_t2 0.2024 0.134106 0.0134106 0.0884898
#> sigma2 1.1422 0.028296 0.0028296 0.0028296
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> I(grade - 1) -0.27849 -0.2711 -0.2659 -0.2601 -0.25175
#> topic1 4.34365 4.5709 4.6584 4.6945 4.76228
#> topic2 4.75032 4.7994 4.8181 4.8420 4.87593
#> effect_t1 -0.51412 -0.2639 -0.1828 -0.1086 -0.01216
#> effect_t2 0.01216 0.1086 0.1828 0.2639 0.51412
#> sigma2 1.08793 1.1245 1.1445 1.1599 1.20649
For a more detailed example of the key functionality of this package, explore the vignette(s) for a good starting point:
browseVignettes("psychtm")
How to Cite the Package
Wilcox, K. T., Jacobucci, R., Zhang, Z., Ammerman, B. A. (2021). Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates. PsyArXiv. https://doi.org/10.31234/osf.io/62tc3
Common Troubleshooting
Ensure that appropriate C++
compilers are installed on your computer:
Mac users will have to download Xcode and its related Command Line Tools (found within Xcode’s Preference Pane under Downloads/Components).
Windows users may need to install Rtools. For easier command line use, be sure to select the option to install Rtools to their path.
Most Linux distributions should already have up-to-date compilers.
Limitations
- This package uses a Gibbs sampling algorithm that can be memory-intensive for a large corpus.
Getting Help
If you think you have found a bug, please open an issue and provide a minimal complete verifiable example.