Learn R Programming

catalytic (version 0.1.0)

validate_glm_initialization_input: Validate Inputs for Catalytic Generalized Linear Models (GLMs) Initialization

Description

This function validates the input parameters required for initializing a catalytic Generalized Linear Model (GLM). It ensures the appropriate structure and compatibility of the formula, family, data, and additional parameters before proceeding with further modeling.

Usage

validate_glm_initialization_input(
  formula,
  family,
  data,
  syn_size,
  custom_variance,
  gaussian_known_variance,
  x_degree
)

Value

Returns nothing if all checks pass; otherwise, raises an error or warning.

Arguments

formula

A formula object specifying the stats::glm model to be fitted. It must not contain random effects or survival terms.

family

A character or family object specifying the error distribution and link function. Valid values are "binomial" and "gaussian".

data

A data.frame containing the data to be used in the GLM.

syn_size

A positive integer specifying the sample size used for the synthetic data.

custom_variance

A positive numeric value for the custom variance used in the model (only applicable for Gaussian family).

gaussian_known_variance

A logical indicating whether the variance is known for the Gaussian family.

x_degree

A numeric vector specifying the degree of the predictors. Its length should match the number of predictors (excluding the response variable).

Details

This function performs the following checks:

  • Ensures that syn_size, custom_variance, and x_degree are positive values.

  • Verifies that the provided formula is suitable for GLMs, ensuring no random effects or survival terms.

  • Checks that the provided data is a data.frame.

  • Confirms that the formula does not contain too many terms relative to the number of columns in data.

  • Ensures that the family is either "binomial" or "gaussian".

  • Validates that x_degree has the correct length relative to the number of predictors in data.

  • Warns if syn_size is too small relative to the number of columns in data.

  • Issues warnings if custom_variance or gaussian_known_variance are used with incompatible families. If any of these conditions are not met, the function raises an error or warning to guide the user.