auto_anova: auto anova

Description

A wrapper around lm and anova to run a regression of a continuous variable against categorical variables. Used for determining the whether the mean of a continuous variable is statistically significant amongst different levels of a categorical variable.

Usage

auto_anova(
  data,
  ...,
  baseline = c("mean", "median", "first_level", "user_supplied"),
  user_supplied_baseline = NULL,
  sparse = FALSE,
  pval_thresh = 0.1
)

Value

data frame

Arguments

data: a data frame
...: tidyselect specification or cols
baseline: choose from "mean", "median", "first_level", "user_supplied". what is the baseline to compare each category to? can use the mean and median of the target variable as a global baseline
user_supplied_baseline: if intercept is "user_supplied", can enter a numeric value
sparse: default FALSE; if true returns a truncated output with only significant results
pval_thresh: control significance level for sparse output filtering

Details

Columns can be inputted as unquoted names or tidyselect. Continuous and categorical variables are automatically determined. If no character or factor column is present, the column with the lowest amount of unique values will be considered the categorical variable.

Description of columns in the output

target: continuous variables
predictor: categorical variables
level: levels in the categorical variables
estimate: difference between level target mean and baseline
target_mean: target mean per level
n: rows in predictor level
std.error: standard error of target in predictor level
level_p.value: p.value for t.test of whether target mean differs significantly between level and baseline
level_significance: level p.value represented by stars
predictor_p.value: p.value for significance of entire predictor given by F test
predictor_significance: predictor p.value represented by stars
conclusion: text interpretation of tests

Examples

Run this code


iris %>%
auto_anova(tidyselect::everything()) -> iris_anova1


iris_anova1 %>%
print(width = Inf)

Run the code above in your browser using DataLab