is_it_normal()
calculates descriptive statistics and conducts univariate
normality testing on one or more numeric variables in a dataset using a
selected statistical test. Optional plots are included for one variable at a
time, only. Results are returned as a named list containing summaries and,
optionally, normality tests and/or diagnostic plots.
is_it_normal(
df,
...,
group_vars = NULL,
seed = 10232015,
normality_test = NULL,
include_plots = FALSE,
plot_theme = traumar::theme_cleaner
)
A named list with the following elements:
A tibble
of summary statistics for each
variable.
A tibble
of test statistics and p-values
(if normality_test == TRUE
).
A patchwork object containing four plots (if include_plots = TRUE
and one variable supplied).
A data.frame
or tibble
containing the variables to assess.
One or more unquoted column names from df
to be analyzed.
Optional. A character vector of column names in df
to
group results by (e.g., c("year", "hospital_level")
). If NULL
, no
grouping is applied. Grouped summaries and normality tests are computed
within each unique combination of values across these variables.
A numeric value passed to set.seed()
to ensure reproducibility.
Default is 10232015
.
A character string specifying the statistical test to
use. Must be one of: "shapiro-wilk" or "shapiro" or "sw"
,
"kolmogorov-smirnov" or "ks"
, "anderson-darling" or "ad"
, "lilliefors" or "lilli"
, "cramer-von-mises" or "cvm"
, "pearson" or "p"
, or
"shapiro-francia" or "sf"
. If NULL
, no normality test is performed,
which is the default.
Logical. If TRUE
, plots are generated for a single
variable. Plotting is disabled if multiple variables are passed.
A ggplot2::theme
function to apply to all plots. Default
is traumar::theme_cleaner
.
Nicolas Foss, Ed.D., MS
If the data do not meet the test requirements for a chosen test of
normality, is_it_normal()
will not run the tests.
Normality tests may yield differing results. Each test has distinct assumptions and sensitivity. Users should verify assumptions and consult test-specific guidance to ensure appropriate use.
The function will abort with helpful CLI messages if input types or structures are incorrect.
If plotting is enabled, and nrow(df) > 10000
, a warning is issued
as plotting may become computationally expensive.