colindiag: Collinearity Diagnostics

Description

Perform a (multi)collinearity diagnostic of a correlation matrix of predictor variables using several indicators, as shown by Olivoto et al. (2017).

Usage

colindiag(.data, ..., by = NULL, n = NULL, verbose = TRUE)

Arguments

.data

The data to be analyzed. Must be a symmetric correlation matrix or, a dataframe containing the predictor variables, or an object of class split_factors.

...

Variables to use in the correlation. If ... is null then all the numeric variables from .data are used. It must be a single variable name or a comma-separated list of unquoted variables names.

One variable (factor) to split the data into subsets. The function is then applied to each subset and returns a list where each element contains the results for one level of the variable in by. To split the data by more than one factor variable, use the function split_factors to pass subsetted data to .data.

If a correlation matrix is provided, then n is the number of objects used to compute the correlation coefficients.

verbose

If verbose = TRUE then some results are shown in the console.

Value

The following values are returned. Please, note that if a grouping variable is used, then the results are returned into a list.

cormat A symmetric Pearson's coefficient correlation matrix between the variables
corlist A hypothesis testing for each of the correlation coefficients
evalevet The eigenvalues with associated eigenvectors of the correlation matrix
VIF The Variance Inflation Factors, being the diagonal elements of the inverse of the correlation matrix.
CN The Condition Number of the correlation matrix, given by the ratio between the largest and smallest eigenvalue.
det The determinant of the correlation matrix.
largest_corr The largest correlation (in absolute value) observed.
smallest_corr The smallest correlation (in absolute value) observed.
weight_var The variables with largest eigenvector (largest weight) in the eigenvalue of smallest value, sorted in decreasing order.

References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196. doi:10.2134/agronj2016.04.0196

Olivoto, T., M. Nardino, I.I.R. Carvalho, D.N. Follmann, M. Ferrari, A.J. de Pelegrin, V.J. Szareski, A.C. de Oliveira, B.O. Caron, and V.Q. de Souza. 2017. Optimal sample size and data arrangement method in estimating correlation matrices with lesser collinearity: A statistical focus in maize breeding. African J. Agric. Res. 12:93-103. doi:10.5897/AJAR2016.11799.

Examples

Run this code

# NOT RUN {
# Using the correlation matrix
library(metan)

cor_iris <- cor(iris[,1:4])
n <- nrow(iris)

col_diag <- colindiag(cor_iris, n = n)


# Using a data frame
col_diag_gen <- data_ge2 %>%
                split_factors(GEN) %>%
                colindiag()

# Diagnostic by levels of a factor selecting desired variables
col_diag_gen <- data_ge2 %>%
                split_factors(GEN) %>%
                colindiag(EH, PH, CD, CL)
# }

Run the code above in your browser using DataLab