This function prepares and initializes a catalytic linear mixed model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model. (Only consider one random effect variance)
cat_lmm_initialization(
formula,
data,
x_cols,
y_col,
z_cols,
group_col = NULL,
syn_size = NULL,
resample_by_group = FALSE,
resample_only = FALSE,
na_replace = mean
)
A list containing the values of all the input arguments and the following components:
Function Information:
function_name
: A character string representing the name of the function, "cat_lmm_initialization".
simple_model
: An object of class lme4::lmer
or stats::lm
, representing the fitted model for generating synthetic response from the original data.
Observation Data Information:
obs_size
: An integer representing the number of observations in the original dataset.
obs_data
: The original data used for fitting the model, returned as a data frame.
obs_x
: A data frame containing the standardized predictor variables from the original dataset.
obs_y
: A numeric vector of the standardized response variable from the original dataset.
obs_z
: A data frame containing the standardized random effect variables from the original dataset.
obs_group
: A numeric vector representing the grouping variable for the original observations.
Synthetic Data Information:
syn_size
: An integer representing the number of synthetic observations generated.
syn_data
: A data frame containing the synthetic dataset, combining synthetic predictor and response variables.
syn_x
: A data frame containing the synthetic predictor variables.
syn_y
: A numeric vector of the synthetic response variable values.
syn_z
: A data frame containing the synthetic random effect variables.
syn_group
: A numeric vector representing the grouping variable for the synthetic observations.
syn_x_resample_inform
: A data frame containing information about the resampling process for synthetic predictors:
Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
syn_z_resample_inform
: A data frame containing information about the resampling process for synthetic random effects. The resampling methods are the same as those from syn_x_resample_inform
.
Whole Data Information:
size
: An integer representing the total size of the combined original and synthetic datasets.
data
: A combined data frame of the original and synthetic datasets.
x
: A combined data frame of the original and synthetic predictor variables.
y
: A combined numeric vector of the original and synthetic response variables.
z
: A combined data frame of the original and synthetic random effect variables.
group
: A combined numeric vector representing the grouping variable for both original and synthetic datasets.
A formula specifying the model. Should include response and predictor variables.
A data frame containing the data for modeling.
A character vector of column names for fixed effects (predictors).
A character string for the name of the response variable.
A character vector of column names for random effects.
A character string for the grouping variable (optional). If not given (NULL), it is extracted from the formula.
An integer specifying the size of the synthetic dataset to be generated, default is length(x_cols) * 4.
A logical indicating whether to resample by group, default is FALSE.
A logical indicating whether to perform resampling only, default is FALSE.
A function to replace NA values in the data, default is mean.
data(mtcars)
cat_init <- cat_lmm_initialization(
formula = mpg ~ wt + (1 | cyl), # formula for simple model
data = mtcars,
x_cols = c("wt"), # Fixed effects
y_col = "mpg", # Response variable
z_cols = c("disp", "hp", "drat", "qsec", "vs", "am", "gear", "carb"), # Random effects
group_col = "cyl", # Grouping column
syn_size = 100, # Synthetic data size
resample_by_group = FALSE, # Resampling option
resample_only = FALSE, # Resampling method
na_replace = mean # NA replacement method
)
cat_init
Run the code above in your browser using DataLab