
This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.
corr_cross(
df,
plot = TRUE,
pvalue = TRUE,
max_pvalue = 1,
type = 1,
max = 1,
top = 20,
local = 1,
ignore = NULL,
contains = NA,
grid = TRUE,
rm.na = FALSE,
quiet = FALSE,
...
)
Depending on input plot
, we get correlation and p-value results for
every combination of features, arranged by descending absolute correlation value,
with a data.frame plot = FALSE
or plot plot = TRUE
.
Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.
Boolean. Show and return a plot?
Boolean. Returns a list, with correlations and statistical significance (p-value) for each value.
Numeric. Filter non-significant variables. Range (0, 1]
Integer. Plot type. 1 is for overall rank. 2 is for local rank.
Numeric. Maximum correlation permitted (from 0 to 1)
Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations
Integer. Label top n local correlations. Only valid when type = 2
Vector or character. Which column should be ignored?
Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).
Boolean. Separate into grids?
Boolean. Remove NAs?
Boolean. Keep quiet? If not, informative messages will be shown.
Additional parameters passed to corr
DataScience+ Post: Find Insights with Ranked Cross-Correlations
Other Correlations:
corr()
,
corr_var()
Other Exploratory:
corr_var()
,
crosstab()
,
df_str()
,
distr()
,
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)
# Show only most relevant results filtered by pvalue
corr_cross(dft, rm.na = TRUE, max_pvalue = 0.05, top = 15)
# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))
# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)
Run the code above in your browser using DataLab