Learn R Programming

textir (version 1.8-6)

predict.topics: topic predict

Description

Predict function for Topic Models

Usage

## S3 method for class 'topics':
predict( object, newcounts, loglhd=FALSE, ... )

Arguments

object
An output object from the topics function, or the corresponding simple matrix of estimated topics.
newcounts
An nrow(object$theta)-column matrix of multinomial phrase/category counts for new documents/observations. Can be either a simple matrix or a simple_triplet_matrix.
loglhd
Whether or not to calculate and return sum(x*log(p)), the un-normalized log likelihood.
...
Additional arguments to the undocumented internal tpx* functions.

Value

  • The output is an nrow(newcounts) by object$K matrix of document topic weights, or a list with including these weights as W and the log likelihood as L.

Details

Under the default mixed-membership topic model, this function uses sequential quadratic programming to fit topic weights $\Omega$ for new documents. Estimates for each new $\omega_i$ are, conditional on object$theta, MAP in the (K-1)-dimensional logit transformed parameter space.

References

Taddy (2012), On Estimation and Selection for Topic Models. http://arxiv.org/abs/1109.4518

See Also

topics, plot.topics, summary.topics, we8there, congress109, wsjibm

Examples

Run this code
## Simulate some data
omega <- t(rdir(500, rep(1/10,10)))
theta <- rdir(10, rep(1/1000,1000))
Q <- omega%*%t(theta)
counts <- matrix(ncol=1000, nrow=500)
totals <- rpois(500, 200)
for(i in 1:500){ counts[i,] <- rmultinom(1, size=totals[i], prob=Q[i,]) }

## predict omega given theta
W <- predict.topics( theta, counts )
plot(W, omega, pch=21, bg=8)

Run the code above in your browser using DataLab