This function correlates a whole dataframe with a single feature. It
automatically runs ohse
(one-hot-smart-encoding) so no need to input
only numerical values.
corr_var(
df,
var,
ignore = NA,
trim = 0,
clean = FALSE,
plot = TRUE,
top = NA,
ceiling = 100,
max_pvalue = 1,
limit = 10,
ranks = FALSE,
zeroes = FALSE,
save = FALSE,
quiet = FALSE,
...
)
Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered!
Variable. Name of the variable to correlate. Note that if the
variable var
is not numerical, 1. you may define which category to select
from using `var_category`; 2. You may have to add redundant = TRUE
to
enable all categories (instead of n-1
).
Character vector. Which columns do you wish to exclude?
Integer. Trim words until the nth character for categorical values (applies for both, target and values)
Boolean. Use lares::cleanText for categorical values (applies for both, target and values)
Boolean. Do you wish to plot the result? If set to TRUE, the function will return only the plot and not the result's data
Integer. If you want to plot the top correlations, define how many
Numeric. Remove all correlations above... Range: (0-100]
Numeric. Filter non-significant variables. Range (0, 1]
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to NA
to ignore argument.
Boolean. Add ranking numbers?
Do you wish to keep zeroes in correlations too?
Boolean. Save output plot into working directory
Boolean. Keep quiet? If not, show messages
Additional parameters passed to corr
data.frame. With variables, correlation and p-value results for each feature, arranged by descending absolute correlation value.
Other Exploratory:
corr_cross()
,
crosstab()
,
df_str()
,
distr()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
freqs()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
,
trendsRelated()
Other Correlations:
corr_cross()
,
corr()
# NOT RUN {
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
corr_var(dft, Survived, method = "spearman", plot = FALSE, top = 10)
# With plots, results are easier to compare:
# Correlate Survived with everything else and show only significant results
dft %>% corr_var(Survived_TRUE, max_pvalue = 0.01)
# Top 15 with less than 50% correlation and show ranks
dft %>% corr_var(Survived_TRUE, ceiling = 60, top = 15, ranks = TRUE)
# }
Run the code above in your browser using DataLab