Learn R Programming

pubrplot (version 0.0.1)

plot_norm: Normality Assessment Plot with Shapiro-Wilk and Kolmogorov–Smirnov Tests

Description

This function visualizes the distribution of multiple numeric variables using boxplots or histograms with overlaid normal distribution curves. It automatically selects the appropriate normality test based on sample size: the Shapiro–Wilk test is applied when sample size is <= 5000, while the Kolmogorov–Smirnov test is used for larger samples (> 5000). The resulting p-values are displayed directly on the plots.

Usage

plot_norm(
  data,
  vars,
  geom = c("box", "hist"),
  color_bar = "#377eb8",
  color_line = "darkred",
  xlab = NULL,
  ylab = NULL,
  bins = 20,
  label_color = "black",
  label_size = 3.5,
  label_vjust = 0,
  label_hjust = 0,
  alpha_bar = 0.5,
  sample_size = 5000,
  label_fraction = 0.05,
  position = NULL,
  p.ypos = NULL
)

Value

A ggplot object displaying the selected normality plots with test p-values.

Arguments

data

A data frame containing the variables to be tested and plotted.

vars

A character vector of column names (numeric variables) to be assessed for normality.

geom

Character string specifying the plot type. Options are "box" for boxplots and "hist" for histograms with normal curves.

color_bar

Fill color for boxplots or histograms.

color_line

Color of the normal distribution curve (only used for histograms).

xlab

X-axis label.

ylab

Y-axis label.

bins

Number of bins used in histograms.

label_color

Color of the normality test p-value text labels.

label_size

Numeric size of the p-value text labels.

label_vjust

Vertical justification of the p-value labels.

label_hjust

Horizontal justification of the p-value labels.

alpha_bar

Transparency level for boxplots or histogram bars.

sample_size

Maximum sample size used for the normality test. When the total sample size exceeds 5000, the Kolmogorov–Smirnov test is applied automatically.

label_fraction

Fraction of plot height used to automatically position p-value labels.

position

Optional named list of manual (x, y) positions for p-value placement per variable.

p.ypos

Optional numeric value or named list to override automatic y-positions for p-values.

Examples

Run this code
## Load example dataset safely
data(diamonds, package = "ggplot2")
## Example 1: Boxplots with Shapiro-Wilk test (n <= 5000)
plot_norm(
  data = diamonds[1:4000, ],
  vars = c("carat", "x", "y"),
  geom = "box"
)

## Example 2: Histograms with Shapiro-Wilk test (n <= 5000)
plot_norm(
  data = diamonds[1:4000, ],
  vars = c("carat", "x", "y"),
  geom = "hist",
  bins = 20,
  p.ypos = 0.6
)

## Example 3: Kolmogorov-Smirnov test automatically applied (n > 5000)
plot_norm(
  data = diamonds[1:6000, ],
  vars = c("carat", "x"),
  geom = "hist",
  bins = 25
)

## Example 4: CO2 dataset (base R)
plot_norm(
  data = CO2,
  vars = c("uptake", "conc"),
  geom = "hist",
  bins = 3
)

Run the code above in your browser using DataLab