sm_preprocess_text

Tokenizes, cleans, and stems text data in preparation for topic modeling.
Removes stopwords, numbers, and performs stemming using the Porter algorithm.

A comprehensive toolkit for mining, analyzing, and visualizing
scientific literature in sport science domains. Provides functions for
retrieving abstracts from 'Scopus', preprocessing text data, performing
advanced topic modeling using Latent Dirichlet Allocation ('LDA'),
Structural Topic Models ('STM'), and Correlated Topic Models ('CTM'),
and creating publication-ready visualizations including keyword
co-occurrence networks and topic trends. For methodological details see
Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993> for 'LDA',
Roberts et al. (2014) <doi:10.1111/ajps.12103> for 'STM', and
Blei and Lafferty (2007) <doi:10.1214/07-AOAS114> for 'CTM'.

Praveen D Chougale

SportMiner

Text Mining and Topic Modeling for Sport Science Literature

sm_preprocess_text function

<dl><dt>data</dt>
<dd>A data.frame containing text data.</dd>
<dt>text_col</dt>
<dd>Name of the column containing text to preprocess.
Default is "abstract".</dd>
<dt>id_col</dt>
<dd>Name of the column containing document IDs. If NULL, a
doc_id column will be created. Default is NULL.</dd>
<dt>min_word_length</dt>
<dd>Minimum word length to retain. Default is 3.</dd>
<dt>custom_stopwords</dt>
<dd>Additional stopwords to remove beyond the standard
English stopwords. Default is NULL.</dd></dl>

Arguments

Preprocess Text for Topic Modeling — sm_preprocess_text

<dl>

<dt>data</dt>
<dd>A data.frame containing text data.</dd>


<dt>text_col</dt>
<dd>Name of the column containing text to preprocess.
Default is "abstract".</dd>


<dt>id_col</dt>
<dd>Name of the column containing document IDs. If NULL, a
doc_id column will be created. Default is NULL.</dd>


<dt>min_word_length</dt>
<dd>Minimum word length to retain. Default is 3.</dd>


<dt>custom_stopwords</dt>
<dd>Additional stopwords to remove beyond the standard
English stopwords. Default is NULL.</dd>

</dl>

sm_preprocess_text: Preprocess Text for Topic Modeling

Description

Usage

Value

Arguments

Examples