LexCA: Correspondence Analysis of a Lexical Table from a TextData object (LexCA)

Description

Performs Correspondence Analysis on the working lexical table contained in TextData object. Supplementary documents, words, segments, contextual quantitative and qualitative variables can be considered if previously selected in TextData function.

Usage

LexCA(object, ncp=5, context.sup="ALL", doc.sup=NULL, word.sup=NULL, 
  segment=FALSE, graph=TRUE, axes=c(1, 2), lmd=3, lmw=3)

Value

Returns a list including:

eig: matrix with the eigenvalues, the percentages of inertia and the cumulative percentages of inertia
row: list of matrices with all the results for the documents (coordinates, square cosines, contributions, inertia)
col: list of matrices with all the results for the words (coordinates, square cosines, contributions, inertia)
row.sup: if row.sup is non-NULL, list of matrices with all the results for the supplementary documents (coordinates, square cosines)
col.sup: if col.sup is non-NULL, list of matrices with all the results for the supplementary words (coordinates, square cosines)
quanti.sup: if quanti.sup is non-NULL, list of matrices containing the results for the supplementary quantitative variables (coordinates, square cosines)
quali.sup: if quali.sup is non-NULL, list of matrices with all the results for the supplementary categorical variables; see section details
meta: list of the documents/words whose contribution is over lmd/lmw times the average document/word contribution
VCr: Cramer's V coefficient
Inertia: total inertia
info: information about the corpus
segment: if segment is TRUE, list of matrices with the results for the repeated segments (coordinates, square cosines)
var.agg: name of the aggregation variable in the case of an aggregate correspondence analysis
call: a list with some statistics

Arguments

object: object of TextData class
ncp: number of dimensions kept in the results (by default 5)
context.sup: column index(es) or name(s) of the contextual qualitative or quantitative variables among those selected in TextData function (by default "ALL")
doc.sup: vector indicating the index(es) or name(s) of the supplementary documents (rows) (by default NULL)
word.sup: vector indicating the index(es) or name(s) of the supplementary words (columns) (by default NULL)
segment: if TRUE, the repeated segments identified by TextData function will be considered as supplementary columns (by default FALSE)
graph: if TRUE, basic graphs are displayed; use plot.LexCA to obtain more graphs (by default TRUE)
axes: length-2 vector indicating the axes to plot (by default axes=c(1,2))
lmd: only the documents whose contribution is over lmd times the average-document-contribution are plotted (by default lmd=3)
lmw: only the words whose contribution is over lmw times the average-word-contribution are plotted (by default lmw=3)

Author

Ramón Alvarez-Esteban ramon.alvarez@unileon.es, Mónica Bécue-Bertaut, Josep-Anton Sánchez-Espigares

Details

In the case of a direct CA, DocTerm is a non-aggregate table and:

the contextual quantitative variables are considered as supplementary quantitative columns in CA.
the categories of the contextual qualitative variables are considered as supplementary columns in CA.

In the case of an aggregate CA, DocTerm is an aggregate table and:

the contextual quantitative variables are considered as supplementary quantitative columns in CA; the value of an active aggregate-document for a variable is the mean of the values corresponding to the source-documents belonging to this aggregate-document.
the categories of the contextual qualitative variables are threatened as supplementary rows in CA; these rows contain the frequency with which each the set of documents belonging to this category has used the different words.

References

Benzécri, J, P. (1981). Pratique de l'analyse des donnees. Linguistique & lexicologie (Vol.3). (P. Dunod., Ed).

Husson F., Lê S., Pagès J. (2011). Exploratory Multivariate Analysis by Example Using R. Chapman & Hall/CRC. tools:::Rd_expr_doi("10.1201/b10345").

Lebart, L., Salem, A., & Berry, L. (1998). Exploring textual data. (D. Kluwer, Ed.). tools:::Rd_expr_doi("10.1007/978-94-017-1525-6").

Murtagh F. (2005). Correspondence Analysis and Data Coding with R and Java. Chapman & Hall/CRC.

Examples

Run this code

data(open.question)
if (FALSE) {
### non-aggregate CA
res.TD<-TextData(open.question, var.text=c(9,10), Fmin=10, Dmin=10,
        remov.number=TRUE, stop.word.tm=TRUE)
res.LexCA<-LexCA(res.TD, lmd=0, lmw=1)
}

### aggregate CA
res.TD<-TextData(open.question, var.text=c(9,10), var.agg="Age_Group", Fmin=10, Dmin=10,
        remov.number=TRUE, stop.word.tm=TRUE)
res.LexCA<-LexCA(res.TD, lmd=0, lmw=1)

Run the code above in your browser using DataLab