This function trains a random forest model and performs domain-level estimation **without bias correction**.
projection_rf(
data_model,
target_column,
predictor_cols,
data_proj,
domain1,
domain2,
psu,
ssu = NULL,
strata = NULL,
weights,
split_ratio = 0.8,
feature_selection = TRUE
)
A list containing the following elements:
model
The trained Random Forest model.
importance
Feature importance showing which features contributed most to the model's predictions.
train_accuracy
Accuracy of the model on the training set.
validation_accuracy
Accuracy of the model on the validation set.
validation_performance
Confusion matrix for the validation set, showing performance metrics like accuracy, precision, recall, etc.
data_proj
The projection data with predicted values.
Domain1
Estimations for Domain 1, including estimated values, variance, and relative standard error.
Domain2
Estimations for Domain 2, including estimated values, variance, and relative standard error.
The training dataset, consisting of auxiliary variables and the target variable.
The name of the target column in the data_model
.
A vector of predictor column names.
The data for projection (prediction), which needs to be projected using the trained model. It must contain the same auxiliary variables as the data_model
Domain variables for survey estimation (e.g., "province")
Domain variables for survey estimation (e.g., "regency")
Primary sampling units, representing the structure of the sampling frame.
Secondary sampling units, representing the structure of the sampling frame (default is NULL).
Stratification variable, ensuring that specific subgroups are represented (default is NULL).
Weights used for the direct estimation from data_model
and indirect estimation from data_proj
.
Proportion of data used for training (default is 0.8, meaning 80 percent for training and 20 percent for validation).
Selection of predictor variables (default is TRUE
)