Learn R Programming

quanteda (version 0.9.2-0)

textmodel_wordfish: wordfish text model

Description

Estimate Slapin and Proksch's (2008) "wordfish" Poisson scaling model of one-dimensional document positions using conditional maximum likelihood.

Usage

textmodel_wordfish(data, dir = c(1, 2), priors = c(Inf, Inf, 3, 1),
  tol = c(1e-06, 1e-08), dispersion = c("poisson", "quasipoisson"),
  dispersionLevel = c("feature", "overall"), dispersionFloor = 0)

## S3 method for class 'textmodel_wordfish_fitted': print(x, n = 30L, ...)

## S3 method for class 'textmodel_wordfish_fitted': show(object)

## S3 method for class 'textmodel_wordfish_predicted': show(object)

Arguments

data
the dfm on which the model will be fit
dir
set global identification by specifying the indexes for a pair of documents such that $\hat{\theta}_{dir[1]} < \hat{\theta}_{dir[2]}$.
priors
prior precisions for the estimated parameters $\alpha_i$, $\psi_j$, $\beta_j$, and $\theta_i$, where $i$ indexes documents and $j$ indexes features
tol
tolerances for convergence. The first value is a convergence threshold for the log-posterior of the model, the second value is the tolerance in the difference in parameter values from the iterative conditional maximum likelihood (from conditionally es
dispersion
sets whether a quasi-poisson quasi-likelihood should be used based on a single dispersion parameter ("poisson"), or quasi-Poisson ("quasipoisson")
dispersionLevel
sets the unit level for the dispersion parameter, options are "feature" for term-level variances, or "overall" for a single dispersion parameter
dispersionFloor
constraint for the minimal underdispersion multiplier in the quasi-Poisson model. Used to minimize the distorting effect of terms with rare term or document frequencies that appear to be severely underdispersed. Default is 0, but this only applies if
x
for print method, the object to be printed
n
max rows of dfm to print
...
additional arguments passed to print
object
wordfish fitted or predicted object to be shown

Value

  • An object of class textmodel_fitted_wordfish. This is a list containing:
  • dirglobal identification of the dimension
  • thetaestimated document positions
  • alphaestimated document fixed effects
  • betaestimated feature marginal effects
  • psiestimated word fixed effects
  • docsdocument labels
  • featuresfeature labels
  • sigmaregularization parameter for betas in Poisson form
  • lllog likelihood at convergence
  • se.thetastandard errors for theta-hats
  • datadfm to which the model was fit

Details

The returns match those of Will Lowe's R implementation of wordfish (see the austin package), except that here we have renamed words to be features. (This return list may change.) We have also followed the practice begun with Slapin and Proksch's early implementation of the model that used a regularization parameter of se$(\sigma) = 3$, through the third element in priors.

References

Jonathan Slapin and Sven-Oliver Proksch. 2008. "A Scaling Model for Estimating Time-Series Party Positions from Texts." American Journal of Political Science 52(3):705-772. Lowe, Will and Kenneth Benoit. 2013. "Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark." Political Analysis 21(3), 298-313. http://doi.org/10.1093/pan/mpt002

Examples

Run this code
textmodel_wordfish(LBGexample, dir = c(1,5))

ie2010dfm <- dfm(ie2010Corpus, verbose = FALSE)
(wfm1 <- textmodel_wordfish(ie2010dfm, dir = c(6,5)))
(wfm2a <- textmodel_wordfish(ie2010dfm, dir = c(6,5), 
                             dispersion = "quasipoisson", dispersionFloor = 0))
(wfm2b <- textmodel_wordfish(ie2010dfm, dir = c(6,5), 
                             dispersion = "quasipoisson", dispersionFloor = .5))
plot(wfm2a@phi, wfm2b@phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
     xlim = c(0, 1.0), ylim = c(0, 1.0))
plot(wfm2a@phi, wfm2b@phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
     xlim = c(0, 1.0), ylim = c(0, 1.0), type = "n")
underdispersedTerms <- sample(which(wfm2a@phi < 1.0), 5)
which(features(ie2010dfm) %in% names(topfeatures(ie2010dfm, 20)))
text(wfm2a@phi, wfm2b@phi, wfm2a@features, 
     cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "grey90")
text(wfm2a@phi[underdispersedTerms], wfm2b@phi[underdispersedTerms], 
     wfm2a@features[underdispersedTerms], 
     cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "black")
if (require(austin)) {
    wfmodelAustin <- austin::wordfish(quanteda::as.wfm(ie2010dfm), dir = c(6,5))
    cor(wfm1@theta, wfm1Austin$theta)
}

Run the code above in your browser using DataLab