is_it_normal() calculates descriptive statistics and conducts univariate
normality testing on one or more numeric variables in a dataset using a
selected statistical test. Optional plots are included for one variable at a
time, only. Results are returned as a named list containing summaries and,
optionally, normality tests and/or diagnostic plots.
is_it_normal(
df,
...,
group_vars = NULL,
seed = 10232015,
normality_test = NULL,
include_plots = FALSE,
plot_theme = traumar::theme_cleaner
)A named list with the following elements:
A tibble of summary statistics for each
variable.
A tibble of test statistics and p-values
(if normality_test == TRUE).
A patchwork object containing four plots (if include_plots = TRUE and one variable supplied).
A data.frame or tibble containing the variables to assess.
One or more unquoted column names from df to be analyzed.
Optional. A character vector of column names in df to
group results by (e.g., c("year", "hospital_level")). If NULL, no
grouping is applied. Grouped summaries and normality tests are computed
within each unique combination of values across these variables.
A numeric value passed to set.seed() to ensure reproducibility.
Default is 10232015.
A character string specifying the statistical test to
use. Must be one of: "shapiro-wilk" or "shapiro" or "sw",
"kolmogorov-smirnov" or "ks", "anderson-darling" or "ad", "lilliefors" or "lilli", "cramer-von-mises" or "cvm", "pearson" or "p", or
"shapiro-francia" or "sf". If NULL, no normality test is performed,
which is the default.
Logical. If TRUE, plots are generated for a single
variable. Plotting is disabled if multiple variables are passed.
A ggplot2::theme function to apply to all plots. Default
is traumar::theme_cleaner.
Nicolas Foss, Ed.D., MS
If the data do not meet the test requirements for a chosen test of
normality, is_it_normal() will not run the tests.
Normality tests may yield differing results. Each test has distinct assumptions and sensitivity. Users should verify assumptions and consult test-specific guidance to ensure appropriate use.
The function will abort with helpful CLI messages if input types or structures are incorrect.
If plotting is enabled, and nrow(df) > 10000, a warning is issued
as plotting may become computationally expensive.