glm()
functions with diagnostics, assumption checking, and post-hoc analysisPerforms Generalized Linear Model (GLM) analysis on a given dataset with options for diagnostics, assumption checking, and post-hoc analysis. Several response parameters can be analyzed in sequence and the generated output can be in various formats ('Word', 'pdf', 'Excel').
f_glm(
formula,
family = gaussian(),
data = NULL,
diagnostic_plots = TRUE,
alpha = 0.05,
adjust = "sidak",
type = "response",
show_assumptions_text = TRUE,
dispersion_test = TRUE,
output_type = "off",
output_file = NULL,
output_dir = NULL,
save_in_wdir = FALSE,
close_generated_files = FALSE,
open_generated_files = TRUE,
influence_threshold = 2,
...
)
An object of class 'f_glm' containing results from glm()
, diagnostics, and post-hoc tests. Using the option "output_type", it can also generate output in the form of: R Markdown code, 'Word', 'pdf', or 'Excel' files. Includes print and plot methods for 'f_glm' objects.
A formula specifying the model to be fitted. More response variables can be
added using -
or +
(e.g., response1 + response2 ~ predictor
) to do
a sequential GLM for each response parameter.
The error distribution and link function to be used in the model (default: gaussian()).
This can be a character string naming a family function, a family function or
the result of a call to a family function. (See family
for details of family functions.)
A data frame containing the variables in the model.
Logical. If TRUE
, plots are included in the output files.
Numeric. Significance level for tests. Default is 0.05
.
Character string specifying the method used to adjust p-values for multiple comparisons. Available methods include:
Tukey's Honest Significant Difference method
Šidák correction
Bonferroni correction
No adjustment
False Discovery Rate adjustment
Default is "sidak"
.
specifying the scale on which the emmeans posthoc results are presented, e.g. "link" to show results on the scale for which the variables are linear and "response" when you want to back transform the data to interpret results in the units of your original data (e.g., probabilities, counts, or untransformed measurements). Default is "response"
.
Logical. If TRUE
, includes a short explanation about GLM assumptions in the output file.
Logical for overdispersion test (default: TRUE).
Character string specifying the output format: "pdf"
, "word"
, "excel"
, "rmd"
, "off"
(no file generated) or "console"
. The option "console"
forces output to be printed. Default is "off"
.
Character string specifying the name of the output file. Default is "dataname_glm_output".
Character string specifying the name of the directory of the output file. Default is tempdir()
.
Logical. If TRUE
, saves the file in the working directory.
Logical. If TRUE
, closes open 'Excel' or 'Word' files depending on the output format. This to be able to save the newly generated file by the f_aov()
function. 'Pdf' files should also be closed before using the function and cannot be automatically closed. Default is FALSE
.
Logical. If TRUE
, Opens the generated output files ('pdf', 'Word' or 'Excel') files depending on the output format. This to directly view the results after creation. Files are stored in tempdir(). Default is TRUE
.
Leverage threshold (default: 2).
Additional arguments passed to glm()
.
Sander H. van Delden plantmind@proton.me
The function first checks if all specified variables are present in the data and ensures that the response variable is numeric.
It performs Analysis of Variance (ANOVA) using the specified formula and data. If shapiro = TRUE
, it checks for normality of residuals using the Shapiro-Wilk test and optionally (transformation = TRUE
) applies a data transformation if residuals are not normal.
If significant differences are found in ANOVA, it proceeds with post hoc tests using estimated marginal means from emmeans()
and Sidak adjustment (or another option of adjust =
.
More response variables can be added using -
or +
(e.g., response1 + response2 ~ predictor
) to do a sequential aov()
for each response parameter captured in one output file.
Outputs can be generated in multiple formats ("pdf", "word", "excel" and "rmd") as specified by output_type
. The function also closes any open 'Word' files to avoid conflicts when generating 'Word' documents. If output_type = "rmd"
is used it is adviced to use it in a chunk with {r, echo=FALSE, results='asis'}
This function requires [Pandoc](https://github.com/jgm/pandoc/releases/tag) (version 1.12.3 or higher), a universal document converter.
Windows: Install Pandoc and ensure the installation folder
(e.g., "C:/Users/your_username/AppData/Local/Pandoc") is added to your system PATH.
macOS: If using Homebrew, Pandoc is typically installed in "/usr/local/bin". Alternatively, download the .pkg installer and verify that the binary’s location is in your PATH.
Linux: Install Pandoc through your distribution’s package manager (commonly installed in "/usr/bin" or "/usr/local/bin") or manually, and ensure the directory containing Pandoc is in your PATH.
If Pandoc is not found, this function may not work as intended.
# GLM Binomial example with output to console and MS Word file
mtcars_mod <- mtcars
mtcars_mod$cyl <- as.factor(mtcars_mod$cyl)
glm_bin <- f_glm(vs ~ cyl,
family = binomial,
data = mtcars_mod,
output_type = "word",
# Do not automatically open the 'Word' file (Default is to open the file)
open_generated_files = FALSE)
print(glm_bin)
# \donttest{
# GLM Poisson example with output to rmd text
data(warpbreaks)
glm_pos <- f_glm(breaks ~ wool + tension,
data = warpbreaks,
family = poisson(link = "log"),
show_assumptions_text = FALSE,
output_type = "rmd")
cat(cat(glm_pos$rmd))
# }
Run the code above in your browser using DataLab