This function performs a series of data cleaning and preprocessing steps to ensure the data is suitable for analysis. This includes:
Missing data handling
Variable type checks
Collinearity and zero-variance feature removal
cleanData(data, y, treatment, x = NULL, binary = FALSE)
A list containing the cleaned dataset and relevant metadata:
N
: The number of observations after cleaning.
K
The number of covariates after cleaning.
X
The cleaned covariate matrix.
treat_vec
: Treatment vector as integers (1 for TRUE, 0 for FALSE).
Y
: The dependent variable vector.
A data.frame containing the data to be cleaned.
Name of the dependent variable (character).
Name of the treatment variable (character, should be logical).
Names of the covariates to include in the model (character vector, optional).
Should the dependent variable be treated as binary? Default is FALSE