tidy_pca
Compute Principal Components and store as a Data Frame
Takes a matrix, perhaps from the output of get_tfidf
, and
returns a data frame with the top principal components extracted. This
is a simple but powerful technique for visualizing a corpus of documents.
Usage
tidy_pca(x, meta = NULL, k = 2, center = TRUE, scale = TRUE)
Arguments
- x
a matrix object to pass to
prcomp
- meta
an optional object to append to the front of the principal components. Can be a vector or a data frame, but must have the same length or number of rows as the number of rows in
x
- k
integer. The number of components to include in the output.
- center
logical. Should the data be centered?
- scale
logical. Should the data be scaled? Note that this will need to be set to false if any columns in
x
are constant ifcenter
is also true.
Value
a data_frame
object containing the top k
principal
components of the data in x, with the object meta
appended
to the front, when it is non-null.
Examples
# NOT RUN {
require(dplyr)
data(obama)
# Get principal components from the non-proper noun lemmas
res <- get_token(obama) %>%
filter(pos %in% c("NN", "NNS")) %>%
get_tfidf()
pca_doc <- tidy_pca(res$tfidf, get_document(obama))
# Plot speeches using the first two principal components
plot(pca_doc$PC1, pca_doc$PC2, col = "white")
text(pca_doc$PC1, pca_doc$PC2, label = 2009:2016)
# }