Learn R Programming

lares (version 4.8.4)

corr_var: Correlation between variable and dataframe

Description

This function correlates a whole dataframe with a single feature. It automatically run one-hot-smart-encoding (ohse) so no need to input only numerical values.

Usage

corr_var(
  df,
  var,
  ignore = NA,
  method = "pearson",
  trim = 0,
  clean = FALSE,
  plot = TRUE,
  logs = FALSE,
  dates = TRUE,
  top = NA,
  ceiling = 100,
  max_pvalue = 1,
  limit = 10,
  zeroes = FALSE,
  save = FALSE,
  subdir = NA,
  file_name = "viz_corrvar.png"
)

Arguments

df

Dataframe.

var

Variable. Name of the variable to correlate

ignore

Character vector. Which columns do you wish to exlude?

method

Character. Any of: c("pearson", "kendall", "spearman")

trim

Integer. Trim words until the nth character for categorical values (applies for both, target and values)

clean

Boolean. Use lares::cleanText for categorical values (applies for both, target and values)

plot

Boolean. Do you wish to plot the result? If set to TRUE, the function will return only the plot and not the result's data

logs

Boolean. Automatically calculate log(values) for numerical variables (not binaries)

dates

Boolean. Do you want the function to create more features out of the date/time columns?

top

Integer. If you want to plot the top correlations, define how many

ceiling

Numeric. Remove all correlations above... Range: (0-100]

max_pvalue

Numeric. Filter non-significant variables. Range (0, 1]

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

zeroes

Do you wish to keep zeroes in correlations too?

save

Boolean. Save output plot into working directory

subdir

Character. Sub directory on which you wish to save the plot

file_name

Character. File name as you wish to save the plot

See Also

Other Exploratory: corr_cross(), crosstab(), df_str(), distr(), freqs_df(), freqs_list(), freqs_plot(), freqs(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), summer(), tree_var(), trendsRelated()

Other Correlations: corr_cross(), corr()

Examples

Run this code
# NOT RUN {
options("lares.font" = NA) # Temporal
data(dft) # Titanic dataset

dft %>% corr_var(Survived, method = "spearman", plot = FALSE, top = 10)

# With plots, results are easier to compare:

# Correlate Survived with everything else and show only significant results
dft %>% corr_var(Survived_TRUE, max_pvalue = 0.05)

# Filter out variables with less than 50% of correlation
dft %>% corr_var(Survived_TRUE, ceiling = 50)

# Show only 10 values
dft %>% corr_var(Survived_TRUE, top = 10)

# Also calculate log(values)
dft %>% corr_var(Survived_TRUE, logs = TRUE, top = 15)
# }

Run the code above in your browser using DataLab