The result is a list of both class hclust and reinert_tall.
Arguments
x
tall data frame of documents
k
maximum number of clusters to compute
term
indicates the type of form "lemma" or "token". Default value is term = "lemma".
segment_size
number of forms by document. Default value is segment_size = 40
min_segment_size
minimum number of forms by document. Default value is min_segment_size = 5
min_split_members
minimum number of segment in a cluster
cc_test
contingency coefficient value for feature selection
tsj
minimum frequency value for feature selection
Details
See the references for original articles on the method.
Special thanks to the authors of the rainette package (https://github.com/juba/rainette)
for inspiring the coding approach used in this function.
References
Reinert M, Une methode de classification descendante hierarchique: application à l'analyse lexicale par contexte, Cahiers de l'analyse des donnees, Volume 8, Numéro 2, 1983. https://www.numdam.org/item/?id=CAD_1983__8_2_187_0
Reinert M., Alceste une méthodologie d'analyse des données textuelles et une application: Aurelia De Gerard De Nerval, Bulletin de Methodologie Sociologique, Volume 26, Numero 1, 1990. tools:::Rd_expr_doi("10.1177/075910639002600103")
Barnier J., Privé F., rainette: The Reinert Method for Textual Data Clustering, 2023, tools:::Rd_expr_doi("10.32614/CRAN.package.rainette")