Learn R Programming

mantar (version 0.2.0)

cor_calc: Correlation Matrix Estimation with Support for Multiple Correlation Types

Description

Computes a correlation matrix from raw data while accounting for missing values through several missing-data handling strategies. Supports different correlation types based on whether variables are treated as ordered.

Usage

cor_calc(
  data,
  ordered = FALSE,
  missing_handling = "two-step-em",
  nimp = 20,
  imp_method = "pmm",
  maxit = 10,
  ...
)

Value

A list containing:

mat

Estimated correlation matrix.

means

Vector of estimated means. If any variable is treated as ordered, means is returned as NULL.

cor_method

A matrix indicating the correlation method used for each variable pair.

args

List of settings used in the correlation estimation.

Arguments

data

Data frame or matrix containing the variables for which the correlation matrix is to be computed. May include missing values.

ordered

Logical vector indicating whether each variable in data should be treated as ordered categorical when computing the correlation matrix. If a single logical value is supplied, it is recycled to all variables.

missing_handling

Character string specifying how the correlation matrix is estimated from data in the presence of missing values. Possible values are:

"two-step-em"

Uses a classical EM algorithm to estimate the correlation matrix from data.

"stacked-mi"

Uses stacked multiple imputation to estimate the correlation matrix from data.

"pairwise"

Uses pairwise deletion to compute correlations from data.

"listwise"

Uses listwise deletion to compute correlations from data.

nimp

Number of imputations (default: 20) to be used when missing_handling = "stacked-mi".

imp_method

Character string specifying the imputation method to be used when missing_handling = "stacked-mi" (default: "pmm" - predictive mean matching).

maxit

Maximum number of iterations for the imputation algorithm when missing_handling = "stacked-mi" (default: 10).

...

Further arguments passed to internal functions.

Details

Correlations are computed pairwise:

  • Polychoric correlations for two ordered variables,

  • Polyserial correlations for one ordered and one continuous variable,

  • Pearson correlations for two continuous variables.

Treating variables as ordered requires the missing handling method to be either "stacked-mi" or "listwise"

Means are computed whenever Pearson correlations are used. If any variable is treated as ordered, means is returned as NULL.

Examples

Run this code
# Estimate correlation matrix from full data set
result <- cor_calc(data = mantar_dummy_full_cont,
                   ordered = FALSE)

# View estimated correlation matrix and methods used
result$mat
result$cor_method

# Estimate correlation matrix for data set with missings
result_mis <- cor_calc(data = mantar_dummy_mis_cont,
                      ordered = FALSE,
                      missing_handling = "two-step-em")

# View estimated correlation matrix and methods used
result_mis$mat
result_mis$cor_method

Run the code above in your browser using DataLab