tidy_pca: Compute Principal Components and store as a Data Frame

Description

Takes a matrix, perhaps from the output of get_tfidf, and returns a data frame with the top principal components extracted. This is a simple but powerful technique for visualizing a corpus of documents.

Usage

tidy_pca(x, meta = NULL, k = 2, center = TRUE, scale = TRUE)

Arguments

a matrix object to pass to prcomp

Value

a data_frame object containing the top k principal components of the data in x, with the object meta appended to the front, when it is non-null.

Examples

Run this code

# NOT RUN {
require(dplyr)
data(obama)

# Get principal components from the non-proper noun lemmas
res <- get_token(obama) %>%
  filter(pos %in% c("NN", "NNS")) %>%
  get_tfidf()
pca_doc <- tidy_pca(res$tfidf, get_document(obama))

# Plot speeches using the first two principal components
plot(pca_doc$PC1, pca_doc$PC2, col = "white")
text(pca_doc$PC1, pca_doc$PC2, label = 2009:2016)

# }

Run the code above in your browser using DataLab