Learn R Programming

lares (version 4.8.4)

corr_cross: Ranked Cross-Correlation

Description

This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.

Usage

corr_cross(
  df,
  plot = TRUE,
  max_pvalue = 1,
  type = 1,
  max = 1,
  top = 25,
  local = 1,
  ignore = NA,
  contains = NA,
  grid = FALSE,
  rm.na = FALSE,
  dummy = TRUE,
  limit = 10,
  redundant = FALSE,
  method = "pearson"
)

Arguments

df

Dataframe.

plot

Boolean. Show and return a plot?

max_pvalue

Numeric. Filter non-significant variables. Range (0, 1]

type

Integer. Plot type. 1 is for overall rank. 2 is for local rank.

max

Numeric. Maximum correlation permitted (from 0 to 1)

top

Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations

local

Integer. Label top n local correlations. Only valid when type = 2

ignore

Character vector. Which columns do you wish to exlude?

contains

Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).

grid

Boolean. Separate into grids?

rm.na

Boolean. Remove NAs?

dummy

Boolean. Should One Hot Encoding be applied to categorical columns?

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

redundant

Boolean. Should we keep redundant columns? i.e. It the column only has two different values, should we keep both new columns?

method

Character. Any of: c("pearson", "kendall", "spearman")

Details

DataScience+ Post: Find Insights with Ranked Cross-Correlations

See Also

Other Correlations: corr_var(), corr()

Other Exploratory: corr_var(), crosstab(), df_str(), distr(), freqs_df(), freqs_list(), freqs_plot(), freqs(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), summer(), tree_var(), trendsRelated()

Examples

Run this code
# NOT RUN {
options("lares.font" = NA) # Temporal
data(dft) # Titanic dataset

# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)

# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)

# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)

# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))
# }

Run the code above in your browser using DataLab