cleanNLP (version 1.10.0)

tidy_pca: Compute Principal Components and store as a Data Frame

Description

Takes a matrix, perhaps from the output of get_tfidf, and returns a data frame with the top principal components extracted. This is a simple but powerful technique for visualizing a corpus of documents.

Usage

tidy_pca(x, meta = NULL, k = 2, center = TRUE, scale = TRUE)

Arguments

x

a matrix object to pass to prcomp

meta

an optional object to append to the front of the principal components. Can be a vector or a data frame, but must have the same length or number of rows as the number of rows in x

k

integer. The number of components to include in the output.

center

logical. Should the data be centered?

scale

logical. Should the data be scaled? Note that this will need to be set to false if any columns in x are constant if center is also true.

Value

a data_frame object containing the top k principal components of the data in x, with the object meta appended to the front, when it is non-null.

Examples

Run this code
# NOT RUN {
require(dplyr)
data(obama)

# Get principal components from the non-proper noun lemmas
res <- get_token(obama) %>%
  filter(pos %in% c("NN", "NNS")) %>%
  get_tfidf()
pca_doc <- tidy_pca(res$tfidf, get_document(obama))

# Plot speeches using the first two principal components
plot(pca_doc$PC1, pca_doc$PC2, col = "white")
text(pca_doc$PC1, pca_doc$PC2, label = 2009:2016)

# }

Run the code above in your browser using DataLab