plotVar: Plot of the (estimated) dependency structure of a variable `x` on a categorical variable `y`

Description

This function allows to visualise the (estimated) distributions of a variable x for each of the categories of a categorical variable y. This allows to study the dependency structure of y on x. Two types of visualisations are available: density plots and boxplots.

Usage

plotVar(
  x,
  y,
  plot_type = c("both", "density", "boxplot")[1],
  x_label = "",
  y_label = "",
  plot_title = ""
)

Value

A ggplot2 plot.

Arguments

x: Metric variable or ordered categorical variable that has at least as many unique values as y
y: Factor variable with at least three categories.
plot_type: Plot type, one of the following: "both" (the default), "density", "boxplot". If "density", a "density" plot is produced, if "boxplot", a "boxplot" is produced, and if "both", both a "density" plot and a "boxplot" are produced. See the 'Details' section of plotMcl for details.
x_label: Optional. The label of the x-axis.
y_label: Optional. The label (heading) of the legend that differentiates the categories of y.
plot_title: Optional. The title of the plot.

Author

Roman Hornung

Details

See the 'Details' section of plotMcl.

References

Hornung, R., Hapfelmeier, A. (2024). Multi forests: Variable importance for multi-class outcomes. arXiv:2409.08925, <tools:::Rd_expr_doi("10.48550/arXiv.2409.08925")>.
Hornung, R. (2022). Diversity forests: Using split sampling to enable innovative complex split procedures in random forests. SN Computer Science 3(2):1, <tools:::Rd_expr_doi("10.1007/s42979-021-00920-1")>.

Examples

Run this code

if (FALSE) {

## Load package:

library("diversityForest")



## Load the "ctg" data set:

data(ctg)


## Set seed to make results reproducible (this is necessary because
## the rug plot produced by 'plotVar' does not show all observations, but
## only a random subset of 1000 observations):

set.seed(1234)


## Using a "density" plot and a "boxplot", visualise the (estimated) 
## distributions of  the variable "Mean" for each of the categories of the 
# variable "Tendency":

plotVar(x = ctg$Mean, y = ctg$Tendency)


## Re-create this plot with labels:

plotVar(x = ctg$Mean, y = ctg$Tendency, x_label = "Mean of the histogram ('Mean')",
        y_label = "Histogram tendency ('Tendency')", 
        plot_title = "Relationship between 'Mean' and 'Tendency'")


## Re-create this plot, but only show the "density" plot:

plotVar(x = ctg$Mean, y = ctg$Tendency, plot_type = "density",
        x_label = "Mean of the histogram ('Mean')", 
        y_label = "Histogram tendency ('Tendency')", 
        plot_title = "Relationship between 'Mean' and 'Tendency'")


## Use ggplot2 and RColorBrewer functionalities to change the line colors and
## the labels of the categories of "Tendency":

library("ggplot2")
library("RColorBrewer")
p <- plotVar(x = ctg$Mean, y = ctg$Tendency, plot_type = "density",
             x_label = "Mean of the histogram ('Mean')", 
             y_label = "Histogram tendency ('Tendency')", 
             plot_title = "Relationship between 'Mean' and 'Tendency'") +
  scale_color_manual(values = brewer.pal(n = 3, name = "Set2"),
                     labels = c("left asymmetric", "symmetric", 
                                "right asymmetric")) +
  scale_linetype_manual(values = rep(1, 3),
                        labels = c("left asymmetric", "symmetric", 
                                   "right asymmetric"))

p

## # Save as PDF:
## ggsave(file="mypathtofolder/FigureXY1.pdf", width=10, height=7)

}

Run the code above in your browser using DataLab