Usage
topicLasso(formula, data, stmobj=NULL, subset=NULL, omit.var=NULL, family="gaussian", main="Topic Effects on Outcome", xlab=expression("Lower Outcome Higher Outcome"), labeltype=c("prob", "frex", "lift", "score"),seed=02138, xlim=c(-4,4), standardize=FALSE, nfolds=20, ...)
Arguments
formula
Formula specifying the dependent variable and additional variables to included in the LASSO beyond the topics present in the stmobj. Just pass a 1 on the right-hand side in order to run without additional controls.
data
Data file containing the dependent variable. Typically will be the metadata file used in the stm analysis. It must have a number of rows equal to the number of documents in the stmobj.
stmobj
The STM object, and output from the stm
function.
subset
A logical statement that will be used to subset the corpus.
omit.var
Pass a character vector of variable names to be excluded from the plot. Note this does not exclude them from the calculation, only the plot.
family
The family parameter used in glmnet
. See explanation there. Defaults to "gaussian"
main
Character string for the main title.
xlab
Character string giving an x-axis label.
labeltype
Type of example words to use in labelling each topic. See labelTopics
. Defaults to "prob".
seed
The random seed for replication of the cross-validation samples.
xlim
Width of the x-axis.
standardize
Whether to standardize variables. Default is FALSE, which is different from the glmnet default because the topics are already standarized. Note that glmnet standardizes the variables by default but then projects them back to their original scales before reporting coefficients.
nfolds
the number of cross-validation folds. Defaults to 20.
...
Additional arguments to be passed to glmnet. This can be useful for addressing convergence problems.