Learn R Programming

maptpx (version 1.9-7)

predict.topics: topic predict

Description

Predict function for Topic Models

Usage

# S3 method for topics
predict( object, newcounts, loglhd=FALSE, ... )

Arguments

object

An output object from the topics function, or the corresponding matrix of estimated topics.

newcounts

An nrow(object$theta)-column matrix of multinomial phrase/category counts for new documents/observations. Can be either a simple matrix or a simple_triplet_matrix.

loglhd

Whether or not to calculate and return sum(x*log(p)), the un-normalized log likelihood.

...

Additional arguments to the undocumented internal tpx* functions.

Value

The output is an nrow(newcounts) by object$K matrix of document topic weights, or a list with including these weights as W and the log likelihood as L.

Details

Under the default mixed-membership topic model, this function uses sequential quadratic programming to fit topic weights \(\Omega\) for new documents. Estimates for each new \(\omega_i\) are, conditional on object$theta, MAP in the (K-1)-dimensional logit transformed parameter space.

References

Taddy (2012), On Estimation and Selection for Topic Models. http://arxiv.org/abs/1109.4518

See Also

topics, plot.topics, summary.topics, congress109

Examples

Run this code
# NOT RUN {
## Simulate some data
omega <- t(rdir(500, rep(1/10,10)))
theta <- rdir(10, rep(1/1000,1000))
Q <- omega%*%t(theta)
counts <- matrix(ncol=1000, nrow=500)
totals <- rpois(500, 200)
for(i in 1:500){ counts[i,] <- rmultinom(1, size=totals[i], prob=Q[i,]) }

## predict omega given theta
W <- predict.topics( theta, counts )
plot(W, omega, pch=21, bg=8)

# }

Run the code above in your browser using DataLab