powered by
This function correlates a whole dataframe, filtering automatically all numerical values.
corr( df, method = "pearson", pvalue = FALSE, ignore = NA, dummy = TRUE, limit = 10, dates = FALSE, redundant = FALSE, logs = FALSE, top = NA )
Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered!
Character. Any of: c("pearson", "kendall", "spearman")
Boolean. Returns a list, with correlations and statistical significance (p-value) for each value
Character vector. Which columns do you wish to exlude?
Boolean. Should One Hot Encoding be applied to categorical columns?
Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.
NA
Boolean. Do you want the function to create more features out of the date/time columns?
Boolean. Should we keep redundat columns? i.e. It the column only has two different values, should we keep both new columns?
Boolean. Automatically calculate log(values) for numerical variables (not binaries)
Integer. Select top N most relevant variables? Filtered and sorted by mean of each variable's correlations
Other Calculus: deg2num(), dist2d(), model_metrics(), quants()
deg2num()
dist2d()
model_metrics()
quants()
Other Correlations: corr_cross(), corr_var()
corr_cross()
corr_var()
# NOT RUN { data(dft) # Titanic dataset df <- dft[,2:5] corr(df) corr(df, ignore = "Pclass") corr(df, redundant = TRUE) corr(df, method = "spearman") corr(df, pvalue = TRUE) # }
Run the code above in your browser using DataLab