estimateEffect: Estimates regressions using an STM object

Description

Estimates a regression where documents are the units, the outcome is the proportion of each document about a topic in an STM model and the covariates are document-meta data. This procedure incorporates measurement uncertainty from the STM model using the method of composition.

Usage

estimateEffect(formula, stmobj, metadata = NULL, 
               uncertainty = c("Global", "Local", "None"), 
               documents = NULL, nsims = 25)

Arguments

formula

A formula for the regression. It should have an integer or vector of numbers on the left-hand side and an equation with covariates on the right hand side. See Details for more information.

stmobj

Model output from STM

metadata

A dataframe where all predictor variables in the formula can be found. If NULL R will look for the variables in the global namespace. It will not look for them in the STM object which for memory efficiency only stores the transf

uncertainty

Which procedure should be used to approximate the measurement uncertainty in the topic proportions. See details for more information. Defaults to the Global approximation.

documents

If uncertainty is set to Local, the user needs to provide the documents object (see stm for format).

nsims

The number of simulated draws from the variational posterior. Defaults to 25. This can often go even lower without affecting the results too dramatically.

Value

parametersA list of K elements each corresponding to a topic. Each element is itself a list of n elements one per simulation. Each simulation contains the MLE of the parameter vector and the variance covariance matrix
topicsThe topic vector
callThe original call
uncertaintyThe user choice of uncertainty measure
formulaThe formula object
dataThe original user provided meta data.
modelframeThe model frame created from the formula and data
varlistA variable list useful for mapping terms with columns in the design matrix

Details

This function performs a regression where topic-proportions are the outcome variable. This allows us to conditional expectation of topic prevalence given document characteristics. Use of the method of composition allows us to incorporate our estimation uncertainty in the dependent variable. The formula specifies the nature of the linear model. On the left hand-side we use a vector of integers to indicate the topics to be included as outcome variables. If left blank then the default of all topics is used. On the right hand-side we can specify a linear model of covariates including standard transformations. Thus the model 2:4 ~ var1 + s(var2) would indicate that we want to run three regressions on Topics 2, 3 and 4 with predictor variables var1 and a b-spline transformed var2. We encourage the use of spline functions for non-linear transformations of variables. The function allows the user to specify any variables in the model. However, we caution that for the assumptions of the method of composition to be the most plausible the topic model should contain at least all the covariates contained in the estimateEffect regression. However the inverse need not be true. The function will automatically check whether the covariate matrix is singular which generally results from linearly dependent columns. Some common causes include a factor variable with an unobserved level, a spline with degrees of freedom that are too high, or a spline with a continuous variable where a gap in the support of the variable results in several empty basis functions. We offer several different methods of incorporating uncertainty. Ideally we would want to use the covariance matrix that governs the variational posterior for each document ($\nu$). The updates for the global parameters rely only on the sum of these matrices and so we do not store copies for each individual document. The default uncertainty method Global uses an approximation to the average covariance matrix formed using the global parameters. The uncertainty method Local steps through each document and updates the parameters calculating and then saving the local covariance matrix. The option None simply uses the map estimates for $\theta$ and does not incorporate any uncertainty. We strongly recommend the Global approximation as it provides the best tradeoff of accuracy and computational tractability.

Examples

Run this code

#Just one topic (note we need c() to indicate it is a vector)
prep <- estimateEffect(c(1) ~ treatment, gadarianFit, gadarian)
plot.estimateEffect(prep, "treatment", model=gadarianFit, method="pointestimate")

#three topics at once
prep <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian)
plot.estimateEffect(prep, "treatment", model=gadarianFit, method="pointestimate")
#See vignette for examples of ploting models with an interaction.