tidy_pca

0th

Percentile

Compute Principal Components and store as a Data Frame

Takes a matrix, perhaps from the output of get_tfidf, and returns a data frame with the top principal components extracted. This is a simple but powerful technique for visualizing a corpus of documents.

Usage
tidy_pca(x, meta = NULL, k = 2, center = TRUE, scale = TRUE)
Arguments
x

a matrix object to pass to prcomp

meta

an optional object to append to the front of the principal components. Can be a vector or a data frame, but must have the same length or number of rows as the number of rows in x

k

integer. The number of components to include in the output.

center

logical. Should the data be centered?

scale

logical. Should the data be scaled? Note that this will need to be set to false if any columns in x are constant if center is also true.

Value

a data_frame object containing the top k principal components of the data in x, with the object meta appended to the front, when it is non-null.

Aliases
  • tidy_pca
Examples
# NOT RUN {
require(dplyr)
data(obama)

# Get principal components from the non-proper noun lemmas
res <- get_token(obama) %>%
  filter(pos %in% c("NN", "NNS")) %>%
  get_tfidf()
pca_doc <- tidy_pca(res$tfidf, get_document(obama))

# Plot speeches using the first two principal components
plot(pca_doc$PC1, pca_doc$PC2, col = "white")
text(pca_doc$PC1, pca_doc$PC2, label = 2009:2016)

# }
Documentation reproduced from package cleanNLP, version 1.10.0, License: LGPL-2

Community examples

Looks like there are no examples yet.