Learn R Programming

tall (version 0.5.2)

reinert: Segment clustering based on the Reinert method - Simple clustering

Description

Segment clustering based on the Reinert method - Simple clustering

Usage

reinert(
  x,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 3,
  min_split_members = 5,
  cc_test = 0.3,
  tsj = 3
)

Value

The result is a list of both class hclust and reinert_tall.

Arguments

x

tall data frame of documents

k

maximum number of clusters to compute

term

indicates the type of form "lemma" or "token". Default value is term = "lemma".

segment_size

number of forms by document. Default value is segment_size = 40

min_segment_size

minimum number of forms by document. Default value is min_segment_size = 5

min_split_members

minimum number of segment in a cluster

cc_test

contingency coefficient value for feature selection

tsj

minimum frequency value for feature selection

Details

See the references for original articles on the method. Special thanks to the authors of the rainette package (https://github.com/juba/rainette) for inspiring the coding approach used in this function.

References

  • Reinert M, Une methode de classification descendante hierarchique: application à l'analyse lexicale par contexte, Cahiers de l'analyse des donnees, Volume 8, Numéro 2, 1983. https://www.numdam.org/item/?id=CAD_1983__8_2_187_0

  • Reinert M., Alceste une méthodologie d'analyse des données textuelles et une application: Aurelia De Gerard De Nerval, Bulletin de Methodologie Sociologique, Volume 26, Numero 1, 1990. tools:::Rd_expr_doi("10.1177/075910639002600103")

  • Barnier J., Privé F., rainette: The Reinert Method for Textual Data Clustering, 2023, tools:::Rd_expr_doi("10.32614/CRAN.package.rainette")

Examples

Run this code
# \donttest{
data(mobydick)
res <- reinert(
  x = mobydick,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 5,
  min_split_members = 10,
  cc_test = 0.3,
  tsj = 3
)
# }

Run the code above in your browser using DataLab