Learn R Programming

statuser (version 0.1.9)

plot_cdf: Plot Empirical Cumulative Distribution Functions by Group

Description

Plots empirical cumulative distribution functions (ECDFs) separately for each unique value of a grouping variable, with support for vectorized plotting parameters. If no grouping variable is provided, plots a single ECDF.

Usage

plot_cdf(
  formula,
  y = NULL,
  data = NULL,
  order = NULL,
  show.ks = TRUE,
  show.quantiles = TRUE,
  ...
)

Value

Invisibly returns a list containing:

  • ecdfs: A list of ECDF function objects, one per group. Each can be called as a function to compute cumulative probabilities (e.g., result$ecdfs[[1]](5) returns P(X <= 5) for group 1).

  • ks_test: (Only when exactly 2 groups) The Kolmogorov-Smirnov test result comparing the two distributions. Access p-value with result$ks_test$p.value.

  • quantile_regression_25: (Only when exactly 2 groups) Quantile regression model for the 25th percentile.

  • quantile_regression_50: (Only when exactly 2 groups) Quantile regression model for the 50th percentile (median).

  • quantile_regression_75: (Only when exactly 2 groups) Quantile regression model for the 75th percentile.

  • warnings: Any warnings captured during execution (if any).

Arguments

formula

A formula of the form y ~ group where y is the response variable and group is the grouping variable. Alternatively, can be just y (without a grouping variable) to plot a single ECDF.

y

An optional second vector to compare with formula. When provided, creates a comparison plot of two variables. This allows syntax like plot_cdf(y1, y2) to compare two vectors.

data

An optional data frame containing the variables in the formula. If data is not provided, variables are evaluated from the calling environment.

order

Controls the order in which groups appear in the plot and legend. Use -1 to reverse the default order. Alternatively, provide a vector specifying the exact order (e.g., c("B", "A", "C")). If NULL (default), groups are ordered by their factor levels (if the grouping variable is a factor) or sorted alphabetically/numerically. Only applies when using grouped plots.

show.ks

Logical. If TRUE (default), shows Kolmogorov-Smirnov test results when there are exactly 2 groups. If FALSE, KS test results are not displayed.

show.quantiles

Logical. If TRUE (default), shows horizontal lines and results at 25th, 50th, and 75th percentiles when there are exactly 2 groups. If FALSE, quantile lines and results are not displayed.

...

Additional arguments passed to plotting functions. Can be single values (applied to all groups) or vectors (applied element-wise to each group). Common parameters include col, lwd, lty, pch, type, etc.

Examples

Run this code
# Basic usage with single variable (no grouping)
y <- rnorm(100)
plot_cdf(y)

# Basic usage with formula syntax and grouping
group <- rep(c("A", "B", "C"), c(30, 40, 30))
plot_cdf(y ~ group)

# With custom colors (scalar - same for all)
plot_cdf(y ~ group, col = "blue")

# With custom colors (vector - different for each group)
plot_cdf(y ~ group, col = c("red", "green", "blue"))

# Multiple parameters
plot_cdf(y ~ group, col = c("red", "green", "blue"), lwd = c(1, 2, 3))

# With line type and point character
plot_cdf(y ~ group, col = c("red", "green", "blue"), lty = c(1, 2, 3), lwd = 2)

# Using data frame
df <- data.frame(value = rnorm(100), group = rep(c("A", "B"), 50))
plot_cdf(value ~ group, data = df)
plot_cdf(value ~ group, data = df, col = c("red", "blue"))

# Compare two vectors
y1 <- rnorm(50)
y2 <- rnorm(50, mean = 1)
plot_cdf(y1, y2)

# Formula syntax without data (variables evaluated from environment)
widgetness <- rnorm(100)
gender <- rep(c("M", "F"), 50)
plot_cdf(widgetness ~ gender)

# Using the returned object
df <- data.frame(value = c(rnorm(50, 0), rnorm(50, 1)), group = rep(c("A", "B"), each = 50))
result <- plot_cdf(value ~ group, data = df)

# Use ECDF to find P(X <= 0.5) for group A
result$ecdfs[[1]](0.5)

# Access KS test p-value
result$ks_test$p.value

# Summarize median quantile regression
summary(result$quantile_regression_50)

Run the code above in your browser using DataLab