Learn R Programming

glmm.hp (version 1.0-0)

detect_glm_family: Automatically Recommend an Appropriate GLM Family

Description

This function assists users in selecting an appropriate family and link function for generalized linear models (GLM) based on the distributional properties of the response variable. It provides a quick diagnostic summary, distribution plots, and an optional AIC comparison among candidate models.

Usage

detect_glm_family(y, plot = TRUE, aic_test = FALSE)

Value

A list containing:

family

Suggested GLM family

link

Suggested link function

Arguments

y

A numeric vector representing the response variable.

plot

Logical; if TRUE, a histogram and boxplot of y will be drawn. Default is TRUE.

aic_test

Logical; if TRUE, a simple AIC comparison across candidate GLM families will be performed. Default is FALSE.

Details

The function inspects the basic characteristics of the response variable, including its range, mean, variance, and whether it contains only integers or proportions. Based on these diagnostics, it suggests one or more candidate GLM families among:

  • "gaussian" – continuous response, may include negative values

  • "poisson" or "quasipoisson" – integer count data (possibly overdispersed)

  • "Gamma" or "inverse.gaussian" – strictly positive continuous data

  • "binomial" – proportion or binary data (0–1 range)

The suggested link functions are: "identity" for Gaussian, "log" for Poisson/Gamma, and "logit" for Binomial.

Examples

Run this code
# Example 1: Continuous response (can be negative)
set.seed(123)
y1 <- rnorm(100)
detect_glm_family(y1)

# Example 2: Count data
y2 <- rpois(100, lambda = 5)
detect_glm_family(y2)

# Example 3: Proportion data
y3 <- rbeta(100, 2, 5)
detect_glm_family(y3)

Run the code above in your browser using DataLab