
Compare the distribution of a target variable vs another variable. This function automatically splits into quantiles for numerical variables. Custom and tidyverse friendly.
distr(
data,
...,
type = 1,
ref = TRUE,
note = NA,
top = 10,
breaks = 10,
na.rm = FALSE,
force = "none",
trim = 0,
clean = FALSE,
abc = FALSE,
custom_colours = FALSE,
plot = TRUE,
chords = FALSE,
save = FALSE,
subdir = NA
)
Dataframe
Variables. Main (target variable) and secondary (values variable) to group by (if needed).
Integer. 1 for both plots, 2 for counter plot only, 3 for percentages plot only.
Boolean. Show a reference line if levels = 2? Quite useful when data is unbalanced (not 50/50) because a reference line is drawn.
Character. Caption for the plot.
Integer. Filter and plot the most n frequent for categorical values.
Integer. Number of splits for numerical values.
Boolean. Ignore NA
s if needed.
Character. Force class on the values data. Choose between 'none', 'character', 'numeric', 'date'
Integer. Trim labels until the nth character for categorical values (applies for both, target and values)
Boolean. Use cleanText()
for categorical values (applies
for both, target and values)
Boolean. Do you wish to sort by alphabetical order?
Boolean. Use custom colours function?
Boolean. Return a plot? Otherwise, a table with results
Boolean. Use a chords plot?
Boolean. Save the output plot in our working directory
Character. Into which subdirectory do you wish to save the plot to?
Plot when plot=TRUE
with two plots in one: counter distribution
grouped by cuts, and proportions distribution grouped by same cuts. data.frame when
plot=FALSE
with counting, percentages, and cumulative percentages results.
When type
argument is used, single plots will be returned.
Other Exploratory:
corr_cross()
,
corr_var()
,
crosstab()
,
df_str()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
freqs()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
,
trendsRelated()
Other Visualization:
freqs_df()
,
freqs_list()
,
freqs_plot()
,
freqs()
,
noPlot()
,
plot_chord()
,
plot_survey()
,
plot_timeline()
,
tree_var()
# NOT RUN {
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset
# Relation for categorical/categorical values
distr(dft, Survived, Sex)
# Relation for categorical/numeric values
dft %>%
distr(Survived, Fare, plot = FALSE) %>%
head(10)
# Sort values
dft %>% distr(Survived, Fare, abc = TRUE)
# Less splits/breaks
dft %>% distr(Survived, Fare, abc = TRUE, breaks = 5)
# Distribution of numerical only
dft[dft$Fare < 20, ] %>% distr(Fare)
# Distribution of numerical/numerical
dft %>% distr(Fare, Age)
# Select only one of the two default plots of distr()
dft %>% distr(Survived, Age, type = 2)
dft %>% distr(Survived, Age, type = 3)
# }
Run the code above in your browser using DataLab