Learn R Programming

lares (version 4.8.4)

corr: Correlation table

Description

This function correlates a whole dataframe, filtering automatically all numerical values.

Usage

corr(
  df,
  method = "pearson",
  pvalue = FALSE,
  ignore = NA,
  dummy = TRUE,
  limit = 10,
  dates = FALSE,
  redundant = FALSE,
  logs = FALSE,
  top = NA
)

Arguments

df

Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered!

method

Character. Any of: c("pearson", "kendall", "spearman")

pvalue

Boolean. Returns a list, with correlations and statistical significance (p-value) for each value

ignore

Character vector. Which columns do you wish to exlude?

dummy

Boolean. Should One Hot Encoding be applied to categorical columns?

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

dates

Boolean. Do you want the function to create more features out of the date/time columns?

redundant

Boolean. Should we keep redundat columns? i.e. It the column only has two different values, should we keep both new columns?

logs

Boolean. Automatically calculate log(values) for numerical variables (not binaries)

top

Integer. Select top N most relevant variables? Filtered and sorted by mean of each variable's correlations

See Also

Other Calculus: deg2num(), dist2d(), model_metrics(), quants()

Other Correlations: corr_cross(), corr_var()

Examples

Run this code
# NOT RUN {
data(dft) # Titanic dataset
df <- dft[,2:5]
corr(df)
corr(df, ignore = "Pclass")
corr(df, redundant = TRUE)
corr(df, method = "spearman")
corr(df, pvalue = TRUE)
# }

Run the code above in your browser using DataLab