stm (version 1.3.3)

plot.MultimodDiagnostic: Plotting Method for Multimodality Diagnostic Objects

Description

The plotting method for objects of the S3 class 'MultimodDiagnostic', which are returned by the function multiSTM(), which performes a battery of tests aimed at assessing the stability of the local modes of an STM model.

Usage

# S3 method for MultimodDiagnostic
plot(x, ind = NULL, topics = NULL, ...)

Arguments

x

An object of S3 class 'MultimodDiagnostic'. See multiSTM.

ind

An integer of list of integers specifying which plots to generate (see details). If NULL (default), all plots are generated.

topics

An integer or vector of integers specifying the topics for which to plot the posterior distribution of covariate effect estimates. If NULL (default), plots are generated for every topic in the S3 object.

...

Other arguments to be passed to the plotting functions.

Details

This methods generates a series of plots, which are indexed as follows. If a subset of the plots is required, specify their indexes using the ind argument. Please note that not all plot types are available for every object of class 'MultimodDiagnostic':

  1. Histogram of Expected Common Words: Generates a 10-bin histogram of the column means of obj$wmat, a K-by-N matrix reporting the number of "top words" shared by the reference model and the candidate model. The "top words" for a given topic are defined as the 10 highest-frequency words.

  2. Histogram of Expected Common Documents: Generates a 10-bin histogram of the column means of obj$tmat, a K-by-N matrix reporting the number of "top documents" shared by the reference model and the candidate model. The "top documents" for a given topic are defined as the 10 documents in the reference corpus with highest topical frequency.

  3. Distribution of .95 Confidence-Interval Coverage for Regression Estimates: Generates a histogram of obj$confidence.ratings, a vector whose entries specify the proportion of regression coefficient estimates in a candidate model that fall within the .95 confidence interval for the corresponding estimate in the reference model. This can only be generated if obj$confidence.ratings is non-NULL.

  4. Posterior Distributions of Covariate Effect Estimates By Topic: Generates a square matrix of plots, each depicting the posterior distribution of the regression coefficients for the covariate speciefied in obj$reg.parameter.index for one topic. The topics for which the plots are to be generated are specified by the topics argument. If the length of topics is not a perfect square, the plots matrix will include white space. The plots have a dashed black vertical line at zero, and a continuous red vertical line indicating the coefficient estimate in the reference model. This can only be generated if obj$cov.effects is non-NULL.

  5. Histogram of Expected L1-Distance From Reference Model: Generates a 10-bin histogram of the column means of obj$lmat, a K-by-N matrix reporting the L1-distance of each topic from the corresponding one in the reference model.

  6. L1-distance vs. Top-10 Word Metric: Produces a smoothed color density representation of the scatterplot of obj$lmat and obj$wmat, the metrics for L1-distance and shared top-words, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.

  7. L1-distance vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of obj$lmat and obj$tmat, the metrics for L1-distance and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.

  8. Top-10 Words vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of obj$wmat and obj$tmat, the metrics for shared top-words and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.

  9. Maximized Bound vs. Aggregate Top-10 Words Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the top-words metric aggregated by model (obj$wmod/1000).

  10. Maximized Bound vs. Aggregate Top-10 Docs Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the top-docs metric aggregated by model (obj$tmod/1000).

  11. Maximized Bound vs. Aggregate L1-Distance Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the L1-distance metric aggregated by model (obj$tmod/1000).

  12. Top-10 Docs Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$tmat.

  13. L1-Distance Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$lmat.

  14. Top-10 Words Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$wmat.

  15. Same as 5, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.

  16. Same as 11, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.

  17. Same as 7, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.

  18. Same as 13, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.

References

Roberts, M., Stewart, B., & Tingley, D. (Forthcoming). "Navigating the Local Modes of Big Data: The Case of Topic Models. In Data Analytics in Social Science, Government, and Industry." New York: Cambridge University Press.

See Also

multiSTM

Examples

Run this code
# NOT RUN {

# }
# NOT RUN {
# Example using Gadarian data

temp<-textProcessor(documents=gadarian$open.ended.response, 
                    metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta
set.seed(02138)
mod.out <- selectModel(docs, vocab, K=3, 
                       prevalence=~treatment + s(pid_rep), 
                       data=meta, runs=20)

out <- multiSTM(mod.out, mass.threshold = .75, 
                reg.formula = ~ treatment,
                metadata = gadarian)

plot(out)
plot(out, 1)

# One more example using Poliblog data

load(url("http://goo.gl/91KbfS"))
meta <- poliblogPrevFit$settings$covariates$X
out <- multiSTM(poliblogSelect, mass.threshold=.75, 
                reg.formula= ~ ratingLiberal,
                metadata=meta)

plot(out, ind=(1:4), topics=1)
plot(out, 16)
# }

Run the code above in your browser using DataCamp Workspace