A wrapper around lm and anova to run a regression of a continuous variable against categorical variables. Used for determining the whether the mean of a continuous variable is statistically significant amongst different levels of a categorical variable.
auto_anova(
data,
...,
baseline = c("mean", "median", "first_level", "user_supplied"),
user_supplied_baseline = NULL,
sparse = FALSE,
pval_thresh = 0.1
)data frame
a data frame
tidyselect specification or cols
choose from "mean", "median", "first_level", "user_supplied". what is the baseline to compare each category to? can use the mean and median of the target variable as a global baseline
if intercept is "user_supplied", can enter a numeric value
default FALSE; if true returns a truncated output with only significant results
control significance level for sparse output filtering
Columns can be inputted as unquoted names or tidyselect. Continuous and categorical variables are automatically determined. If no character or factor column is present, the column with the lowest amount of unique values will be considered the categorical variable.
Description of columns in the output
continuous variables
categorical variables
levels in the categorical variables
difference between level target mean and baseline
target mean per level
rows in predictor level
standard error of target in predictor level
p.value for t.test of whether target mean differs significantly between level and baseline
level p.value represented by stars
p.value for significance of entire predictor given by F test
predictor p.value represented by stars
text interpretation of tests
iris %>%
auto_anova(tidyselect::everything()) -> iris_anova1
iris_anova1 %>%
print(width = Inf)
Run the code above in your browser using DataLab