This function validates the parameters needed for initializing a catalytic Linear Mixed Model (LMM) or Generalized Linear Model (GLM) based on the input formula, data, and column specifications.
validate_lmm_initialization_input(
formula,
data,
x_cols,
y_col,
z_cols,
group_col,
syn_size
)Returns nothing if all checks pass; otherwise, raises an error or warning.
An object of class formula representing the model formula, typically including fixed and random effects for LMMs or for GLMs.
A data.frame containing the data for model fitting. This should include all columns specified in x_cols, y_col, z_cols, and group_col.
A character vector of column names to be used as predictor variables in the model.
A single character string specifying the name of the response variable column.
A character vector of column names to be used as additional predictors or grouping factors, depending on the model structure.
A single character string specifying the name of the grouping variable for random effects.
Optional. A positive integer indicating the synthetic data size, typically for use in data augmentation or model diagnostics.
This function performs the following checks:
Ensures syn_size is a positive integer.
Verifies that formula is not for survival analysis (e.g., does not contain Surv terms).
Checks that the formula is not overly complex by confirming it has fewer terms than the total columns in data.
Ensures y_col and group_col each contain only one column name.
Confirms data is a data.frame.
Validates that all specified columns in x_cols, y_col, z_cols, and group_col exist in data without overlap or missing values.
Warns if syn_size is set too small relative to the data dimensions, recommending a larger value.
If any of these conditions are not met, the function raises an error or warning to guide the user.