estimateEffect(formula, stmobj, metadata = NULL, uncertainty = c("Global", "Local", "None"), documents = NULL, nsims = 25, prior=NULL)
NULL
R will look for the variables in the global namespace. It will not look for them in the STM
object which for memory efficiency only stores the transformed design matrix and thus will not in general have the original covariates.
Local
, the user needs to provide the documents object (see stm
for format).
ncol(X)
). When the design matrix is collinear but this argument is not specified, a warning will pop up and the function will estimate with a small default penalty.
The formula specifies the nature of the linear model. On the left hand-side we use a vector of integers to indicate the topics to be included as outcome variables. If left blank then the default of all topics is used. On the right hand-side we can specify a linear model of covariates including standard transformations. Thus the model 2:4 ~ var1 + s(var2)
would indicate that we want to run three regressions on Topics 2, 3 and 4 with predictor variables var1
and a b-spline transformed var2
. We encourage the use of spline functions for non-linear transformations of variables.
The function allows the user to specify any variables in the model. However, we caution that for the assumptions of the method of composition to be the most plausible the topic model should contain at least all the covariates contained in the estimateEffect
regression. However the inverse need not be true. The function will automatically check whether the covariate matrix is singular which generally results from linearly dependent columns. Some common causes include a factor variable with an unobserved level, a spline with degrees of freedom that are too high, or a spline with a continuous variable where a gap in the support of the variable results in several empty basis functions. In these cases the function will still estimate by adding a small ridge penalty to the likelihood. However, we emphasize that while this will produce an estimate it is only identified by the penalty. In many cases this will be an indication that the user should specify a different model.
The function can handle factors and numeric variables. Dates should be converted to numeric variables before analysis.
We offer several different methods of incorporating uncertainty. Ideally we would want to use the covariance matrix that governs the variational posterior for each document ($\nu$). The updates for the global parameters rely only on the sum of these matrices and so we do not store copies for each individual document. The default uncertainty method Global
uses an approximation to the average covariance matrix formed using the global parameters. The uncertainty method Local
steps through each document and updates the parameters calculating and then saving the local covariance matrix. The option None
simply uses the map estimates for $\theta$ and does not incorporate any uncertainty. We strongly recommend the Global
approximation as it provides the best tradeoff of accuracy and computational tractability.
Effects are plotted based on the results of estimateEffect
which contains information on how the estimates are constructed. Note that in some circumstances the expected value of a topic proportion given a covariate level can be above 1 or below 0. This is because we use a Normal distribution rather than something constrained to the range between 0 and 1. If a continuous variable goes above 0 or 1 within the range of the data it may indicate that a more flexible non-linear specification is needed (such as using a spline or a spline with greater degrees of freedom).
plot.estimateEffect
#Just one topic (note we need c() to indicate it is a vector)
prep <- estimateEffect(c(1) ~ treatment, gadarianFit, gadarian)
plot.estimateEffect(prep, "treatment", model=gadarianFit, method="pointestimate")
#three topics at once
## Not run:
# prep <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian)
# plot.estimateEffect(prep, "treatment", model=gadarianFit, method="pointestimate")
# ## End(Not run)
#See vignette for examples of ploting models with an interaction.
Run the code above in your browser using DataLab