is_it_normal: Exploratory Data Analysis, Normality Testing, and Visualization

Description

is_it_normal() calculates descriptive statistics and conducts univariate normality testing on one or more numeric variables in a dataset using a selected statistical test. Optional plots are included for one variable at a time, only. Results are returned as a named list containing summaries and, optionally, normality tests and/or diagnostic plots.

Usage

is_it_normal(
  df,
  ...,
  group_vars = NULL,
  seed = 10232015,
  normality_test = NULL,
  include_plots = FALSE,
  plot_theme = traumar::theme_cleaner
)

Value

A named list with the following elements:

descriptive_statistics: A tibble of summary statistics for each variable.
normality_test: A tibble of test statistics and p-values (if normality_test == TRUE).
plots: A patchwork object containing four plots (if include_plots = TRUE and one variable supplied).

Arguments

df: A data.frame or tibble containing the variables to assess.
...: One or more unquoted column names from df to be analyzed.
group_vars: Optional. A character vector of column names in df to group results by (e.g., c("year", "hospital_level")). If NULL, no grouping is applied. Grouped summaries and normality tests are computed within each unique combination of values across these variables.
seed: A numeric value passed to set.seed() to ensure reproducibility. Default is 10232015.
normality_test: A character string specifying the statistical test to use. Must be one of: "shapiro-wilk" or "shapiro" or "sw", "kolmogorov-smirnov" or "ks", "anderson-darling" or "ad", "lilliefors" or "lilli", "cramer-von-mises" or "cvm", "pearson" or "p", or "shapiro-francia" or "sf". If NULL, no normality test is performed, which is the default.
include_plots: Logical. If TRUE, plots are generated for a single variable. Plotting is disabled if multiple variables are passed.
plot_theme: A ggplot2::theme function to apply to all plots. Default is traumar::theme_cleaner.

Author

Nicolas Foss, Ed.D., MS

Details

If the data do not meet the test requirements for a chosen test of normality, is_it_normal() will not run the tests.
Normality tests may yield differing results. Each test has distinct assumptions and sensitivity. Users should verify assumptions and consult test-specific guidance to ensure appropriate use.
The function will abort with helpful CLI messages if input types or structures are incorrect.
If plotting is enabled, and nrow(df) > 10000, a warning is issued as plotting may become computationally expensive.