Learn R Programming

mda.biber (version 1.0.1)

mda_loadings: Conduct multi-dimensional analysis

Description

Multi-Dimensional Analysis is a statistical procedure developed by Biber and is commonly used in descriptions of language as it varies by genre, register, and task. The procedure is a specific application of factor analysis, which is used as the basis for calculating a 'dimension score' for each text.

Usage

mda_loadings(obs_by_group, n_factors, cor_min = 0.2, threshold = 0.35)

Value

An mda data frame containing one row per document, containing factor scores for each document. Attributes include the number of factors (n_factors), the correlation threshold (threshold), the factor loadings (loadings), and the mean factor score for each group (group_means).

Arguments

obs_by_group

A data frame containing exactly 1 categorical (factor) variable and multiple continuous (numeric) variables. Each row represents one document/observation.

n_factors

The number of factors to be calculated in the factor analysis.

cor_min

The correlation threshold for including variables in the factor analysis. Variables whose (absolute) Pearson correlation with any other variable is greater than this threshold will be included in the factor analysis. Set to 0 to disable thresholding.

threshold

The loading threshold above which variables should be included in factor score calculations. Set to 0 to include all variables.

Details

MDA is fundamentally factor analysis using the promax rotation, applied to the numeric variables in obs_by_group. However, MDA adds two screening steps:

  1. Only variables with a nontrivial correlation with any other variable are included; the correlation threshold is configurable with the cor_min argument.

  2. The factor scores are based only on variables whose loadings are greater (in absolute value) than the threshold argument. (Variables are standardized to ensure loadings are comparable.)

These two choices eliminate variables that are uncorrelated with others, and essentially enforce sparsity in each factor, ensuring it is loaded only on a smaller set of variables.

References

Biber (1988). Variation across Speech and Writing. Cambridge University Press.

Biber (1992). "The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings." Computers and the Humanities 26 (5/6), 331-345. tools:::Rd_expr_doi("10.1007/BF00136979")

See Also

screeplot_mda(), stickplot_mda(), boxplot_mda()

Examples

Run this code
# Extract the subject area from each document ID and use it as the grouping
# variable
micusp_biber$doc_id <- factor(substr(micusp_biber$doc_id, 1, 3))

m <- mda_loadings(micusp_biber, n_factors = 2)

attr(m, "group_means")

heatmap_mda(m)

Run the code above in your browser using DataLab