prep_docs: Prepare documents in a data frame for modeling
Description
prep_docs() takes documents stored as a column of a data frame and
converts them into a list containing a matrix representation of documents
and vocabulary character vector for modeling.
Usage
prep_docs(data, col, lower = TRUE)
Arguments
data
A data frame containing a column of documents.
col
A character string denoting the column of documents in data.
lower
Should all terms be converted to lowercase? (default: TRUE).
Value
A list with two components:
documents A matrix of term uses with one row per document and one
column per term position up to the number of terms in the longest document;
vocab A character vector of unique terms in the documents.
# NOT RUN {data(teacher_rate) # Synthetic student ratings of instructorsdocs_vocab <- prep_docs(teacher_rate, "doc")
str(docs_vocab) # A list with two components `documents` and `vocab`# }