cluster_docs

This function clusters documents using K-means based on their TF-IDF vectors.

A suite of tools for literature-based discovery in biomedical research.
Provides functions for retrieving scientific articles from 'PubMed' and
other NCBI databases, extracting biomedical entities (diseases, drugs, genes, etc.),
building co-occurrence networks, and applying various discovery models
including 'ABC', 'AnC', 'LSI', and 'BITOLA'. The package also includes
visualization tools for exploring discovered connections.

Chao Liu Liu

LBDiscover

Literature-Based Discovery Tools for Biomedical Research

Chao Liu 

cluster_docs function

<dl><dt>text_data</dt>
<dd>A data frame containing text data.</dd>
<dt>text_column</dt>
<dd>Name of the column containing text to analyze.</dd>
<dt>n_clusters</dt>
<dd>Number of clusters to create.</dd>
<dt>min_term_freq</dt>
<dd>Minimum frequency for a term to be included.</dd>
<dt>max_doc_freq</dt>
<dd>Maximum document frequency (as a proportion) for a term to be included.</dd>
<dt>random_seed</dt>
<dd>Seed for random number generation (for reproducibility).</dd></dl>

Arguments

Cluster documents using K-means — cluster_docs

<dl>

<dt>text_data</dt>
<dd>A data frame containing text data.</dd>


<dt>text_column</dt>
<dd>Name of the column containing text to analyze.</dd>


<dt>n_clusters</dt>
<dd>Number of clusters to create.</dd>


<dt>min_term_freq</dt>
<dd>Minimum frequency for a term to be included.</dd>


<dt>max_doc_freq</dt>
<dd>Maximum document frequency (as a proportion) for a term to be included.</dd>


<dt>random_seed</dt>
<dd>Seed for random number generation (for reproducibility).</dd>

</dl>

cluster_docs: Cluster documents using K-means

Description

Usage

Value

Arguments