CaGalt: Correspondence Analysis on Generalised Aggregated Lexical Table (CaGalt)

Description

Correspondence Analysis on Generalised Aggregated Lexical Table (CaGalt) aims at expanding correspondence analysis on an aggregated lexical table to the case of several quantitative and categorical variables with the objective of establishing a typology of the variables and a typology of the frequencies from their mutual relationships. To avoid the instability issued from multicollinearity among the contextual variables and limit the influence of noisy measurements, the contextual variables are substituted by their principal components. Validation tests in the form of confidence ellipses for the frequencies and the variables are also proposed.

Usage

CaGalt(Y, X, type="s", conf.ellip=FALSE, nb.ellip=100, level.ventil=0,
  sx=NULL, graph=TRUE, axes=c(1,2))

Arguments

a data frame with n rows (individuals) and p columns (frequencies)

a data frame with n rows (individuals) and k columns (quantitative or categorical variables)

type

the type of variables: "c" or "s" for quantitative variables and "n" for categorical variables. The difference is that for "s" variables are scaled to unit variance (by default, variables are scaled to unit variance)

conf.ellip

boolean (FALSE by default), if TRUE, draw confidence ellipses around the frequencies and the variables when "graph" is TRUE

nb.ellip

number of bootstrap samples to compute the confidence ellipses (by default 100)

level.ventil

proportion corresponding to the level under which the category is ventilated; by default, 0 and no ventilation is done. Available only when type is equal to "n"

number of principal components kept from the principal axes analysis of the contextual variables (by default is NULL and all principal components are kept)

graph

boolean, if TRUE a graph is displayed

axes

a length 2 vector specifying the components to plot

Value

Returns a list including:

eig

a matrix containing all the eigenvalues, the percentage of variance and the cumulative percentage of variance

ind

a list of matrices containing all the results for the individuals (coordinates, square cosine)

freq

a list of matrices containing all the results for the frequencies (coordinates, square cosine, contributions)

quanti.var

a list of matrices containing all the results for the quantitative variables (coordinates, correlation between variables and axes, square cosine)

quali.var

a list of matrices containing all the results for the categorical variables (coordinates of each categories of each variables, square cosine)

ellip

a list of matrices containing the coordinates of the frequencies and variables for replicated samples from which the confidence ellipses are constructed

Returns the individuals, the frequencies and the variables factor map. If there are more than 50 frequencies, the first 50 frequencies that have the highest contribution on the 2 dimensions of your plot are drawn. The plots may be improved using the argument autolab, modifying the size of the labels or selecting some elements thanks to the plot.CaGalt function.

References

Becue-Bertaut, M., Pages, J. and Kostov, B. (2014). Untangling the influence of several contextual variables on the respondents'\ lexical choices. A statistical approach.SORT Becue-Bertaut, M. and Pages, J. (2014). Correspondence analysis of textual data involving contextual information: Ca-galt on principal components.Advances in Data Analysis and Classification

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
###Example with categorical variables
data(health)
res.cagalt<-CaGalt(Y=health[,1:115],X=health[,116:118],type="n")
# }

Run the code above in your browser using DataLab