Learn R Programming

SportMiner (version 0.1.0)

sm_create_dtm: Create Document-Term Matrix

Description

Converts preprocessed word counts into a document-term matrix suitable for topic modeling. Filters rare terms and empty documents.

Usage

sm_create_dtm(word_counts, min_term_freq = 3, max_term_freq = 0.5)

Value

A DocumentTermMatrix object from the tm package.

Arguments

word_counts

A data.frame with columns doc_id, stem, and n, typically produced by sm_preprocess_text().

min_term_freq

Minimum number of documents a term must appear in to be retained. Default is 3.

max_term_freq

Maximum proportion of documents a term can appear in. Useful for removing ubiquitous terms. Default is 0.5 (50 percent).

Examples

Run this code
if (FALSE) {
processed <- sm_preprocess_text(papers)
dtm <- sm_create_dtm(processed)
}

Run the code above in your browser using DataLab