corr_cross: Ranked cross-correlation across all variables

Description

This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.

Usage

corr_cross(
  df,
  plot = TRUE,
  pvalue = TRUE,
  max_pvalue = 1,
  type = 1,
  max = 1,
  top = 20,
  local = 1,
  ignore = NULL,
  contains = NA,
  grid = TRUE,
  rm.na = FALSE,
  quiet = FALSE,
  ...
)

Value

Depending on input plot, we get correlation and p-value results for every combination of features, arranged by descending absolute correlation value, with a data.frame plot = FALSE or plot plot = TRUE.

Arguments

df: Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.
plot: Boolean. Show and return a plot?
pvalue: Boolean. Returns a list, with correlations and statistical significance (p-value) for each value.
max_pvalue: Numeric. Filter non-significant variables. Range (0, 1]
type: Integer. Plot type. 1 is for overall rank. 2 is for local rank.
max: Numeric. Maximum correlation permitted (from 0 to 1)
top: Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations
local: Integer. Label top n local correlations. Only valid when type = 2
ignore: Vector or character. Which column should be ignored?
contains: Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).
grid: Boolean. Separate into grids?
rm.na: Boolean. Remove NAs?
quiet: Boolean. Keep quiet? If not, informative messages will be shown.
...: Additional parameters passed to corr

Details

DataScience+ Post: Find Insights with Ranked Cross-Correlations

Examples

Run this code

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)

# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)

# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))

# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025