This function validates the parameters needed for initializing a catalytic Linear Mixed Model (LMM) or Generalized Linear Model (GLM) based on the input formula, data, and column specifications.
validate_lmm_initialization_input(
formula,
data,
x_cols,
y_col,
z_cols,
group_col,
syn_size
)
Returns nothing if all checks pass; otherwise, raises an error or warning.
An object of class formula
representing the model formula, typically including fixed and random effects for LMMs or for GLMs.
A data.frame
containing the data for model fitting. This should include all columns specified in x_cols
, y_col
, z_cols
, and group_col
.
A character vector of column names to be used as predictor variables in the model.
A single character string specifying the name of the response variable column.
A character vector of column names to be used as additional predictors or grouping factors, depending on the model structure.
A single character string specifying the name of the grouping variable for random effects.
Optional. A positive integer indicating the synthetic data size, typically for use in data augmentation or model diagnostics.
This function performs the following checks:
Ensures syn_size
is a positive integer.
Verifies that formula
is not for survival analysis (e.g., does not contain Surv
terms).
Checks that the formula is not overly complex by confirming it has fewer terms than the total columns in data
.
Ensures y_col
and group_col
each contain only one column name.
Confirms data
is a data.frame
.
Validates that all specified columns in x_cols
, y_col
, z_cols
, and group_col
exist in data
without overlap or missing values.
Warns if syn_size
is set too small relative to the data dimensions, recommending a larger value.
If any of these conditions are not met, the function raises an error or warning to guide the user.