f_glm: Perform multiple `glm()` functions with diagnostics, assumption checking, and post-hoc analysis

Description

Performs Generalized Linear Model (GLM) analysis on a given dataset with options for diagnostics, assumption checking, and post-hoc analysis. Several response parameters can be analyzed in sequence and the generated output can be in various formats ('Word', 'pdf', 'Excel').

Usage

f_glm(
  formula,
  family = gaussian(),
  data = NULL,
  diagnostic_plots = TRUE,
  alpha = 0.05,
  adjust = "sidak",
  type = "response",
  show_assumptions_text = TRUE,
  dispersion_test = TRUE,
  output_type = "off",
  output_file = NULL,
  output_dir = NULL,
  save_in_wdir = FALSE,
  close_generated_files = FALSE,
  open_generated_files = TRUE,
  influence_threshold = 2,
  ...
)

Value

An object of class 'f_glm' containing results from glm(), diagnostics, and post-hoc tests. Using the option "output_type", it can also generate output in the form of: R Markdown code, 'Word', 'pdf', or 'Excel' files. Includes print and plot methods for 'f_glm' objects.

Arguments

formula

A formula specifying the model to be fitted. More response variables can be added using - or + (e.g., response1 + response2 ~ predictor) to do a sequential GLM for each response parameter.

family

The error distribution and link function to be used in the model (default: gaussian()). This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.)

data

A data frame containing the variables in the model.

diagnostic_plots

Logical. If TRUE, plots are included in the output files.

alpha

Numeric. Significance level for tests. Default is 0.05.

adjust

Character string specifying the method used to adjust p-values for multiple comparisons. Available methods include:

"tukey": Tukey's Honest Significant Difference method

"sidak"

Šidák correction

"bonferroni"

Bonferroni correction

"none"

No adjustment

"fdr"

False Discovery Rate adjustment

Default is "sidak".

type

specifying the scale on which the emmeans posthoc results are presented, e.g. "link" to show results on the scale for which the variables are linear and "response" when you want to back transform the data to interpret results in the units of your original data (e.g., probabilities, counts, or untransformed measurements). Default is "response".

show_assumptions_text

Logical. If TRUE, includes a short explanation about GLM assumptions in the output file.

dispersion_test

Logical for overdispersion test (default: TRUE).

output_type

Character string specifying the output format: "pdf", "word", "excel", "rmd", "off" (no file generated) or "console". The option "console" forces output to be printed. Default is "off".

output_file

Character string specifying the name of the output file. Default is "dataname_glm_output".

output_dir

Character string specifying the name of the directory of the output file. Default is tempdir().

save_in_wdir

Logical. If TRUE, saves the file in the working directory.

close_generated_files

Logical. If TRUE, closes open 'Excel' or 'Word' files depending on the output format. This to be able to save the newly generated file by the f_aov() function. 'Pdf' files should also be closed before using the function and cannot be automatically closed. Default is FALSE.

open_generated_files

Logical. If TRUE, Opens the generated output files ('pdf', 'Word' or 'Excel') files depending on the output format. This to directly view the results after creation. Files are stored in tempdir(). Default is TRUE.

influence_threshold

Leverage threshold (default: 2).

...

Additional arguments passed to glm().

Author

Sander H. van Delden plantmind@proton.me

Details

The function first checks if all specified variables are present in the data and ensures that the response variable is numeric.

It performs Analysis of Variance (ANOVA) using the specified formula and data. If shapiro = TRUE, it checks for normality of residuals using the Shapiro-Wilk test and optionally (transformation = TRUE) applies a data transformation if residuals are not normal.

If significant differences are found in ANOVA, it proceeds with post hoc tests using estimated marginal means from emmeans() and Sidak adjustment (or another option of adjust =.

More response variables can be added using - or + (e.g., response1 + response2 ~ predictor) to do a sequential aov() for each response parameter captured in one output file.

Outputs can be generated in multiple formats ("pdf", "word", "excel" and "rmd") as specified by output_type. The function also closes any open 'Word' files to avoid conflicts when generating 'Word' documents. If output_type = "rmd" is used it is adviced to use it in a chunk with {r, echo=FALSE, results='asis'}

This function requires [Pandoc](https://github.com/jgm/pandoc/releases/tag) (version 1.12.3 or higher), a universal document converter.

Windows: Install Pandoc and ensure the installation folder
(e.g., "C:/Users/your_username/AppData/Local/Pandoc") is added to your system PATH.
macOS: If using Homebrew, Pandoc is typically installed in "/usr/local/bin". Alternatively, download the .pkg installer and verify that the binary’s location is in your PATH.
Linux: Install Pandoc through your distribution’s package manager (commonly installed in "/usr/bin" or "/usr/local/bin") or manually, and ensure the directory containing Pandoc is in your PATH.
If Pandoc is not found, this function may not work as intended.

Examples

Run this code

# GLM Binomial example with output to console and MS Word file
mtcars_mod <- mtcars
mtcars_mod$cyl <- as.factor(mtcars_mod$cyl)

glm_bin <- f_glm(vs ~ cyl,
                 family = binomial,
                 data = mtcars_mod,
                 output_type = "word",
                 # Do not automatically open the 'Word' file (Default is to open the file)
                 open_generated_files = FALSE)
print(glm_bin)

# \donttest{
# GLM Poisson example with output to rmd text
data(warpbreaks)

glm_pos <- f_glm(breaks ~ wool + tension,
                 data = warpbreaks,
                 family = poisson(link = "log"),
                 show_assumptions_text = FALSE,
                 output_type = "rmd")
cat(cat(glm_pos$rmd))
# }

Run the code above in your browser using DataLab