projection_rf: Projection RF Function

Description

This function trains a random forest model and performs domain-level estimation **without bias correction**.

Usage

projection_rf(
  data_model,
  target_column,
  predictor_cols,
  data_proj,
  domain1,
  domain2,
  psu,
  ssu = NULL,
  strata = NULL,
  weights,
  split_ratio = 0.8,
  feature_selection = TRUE
)

Value

A list containing the following elements:

model The trained Random Forest model.
importance Feature importance showing which features contributed most to the model's predictions.
train_accuracy Accuracy of the model on the training set.
validation_accuracy Accuracy of the model on the validation set.
validation_performance Confusion matrix for the validation set, showing performance metrics like accuracy, precision, recall, etc.
data_proj The projection data with predicted values.
Domain1 Estimations for Domain 1, including estimated values, variance, and relative standard error.
Domain2 Estimations for Domain 2, including estimated values, variance, and relative standard error.

Arguments

data_model: The training dataset, consisting of auxiliary variables and the target variable.
target_column: The name of the target column in the data_model.
predictor_cols: A vector of predictor column names.
data_proj: The data for projection (prediction), which needs to be projected using the trained model. It must contain the same auxiliary variables as the data_model
domain1: Domain variables for survey estimation (e.g., "province")
domain2: Domain variables for survey estimation (e.g., "regency")
psu: Primary sampling units, representing the structure of the sampling frame.
ssu: Secondary sampling units, representing the structure of the sampling frame (default is NULL).
strata: Stratification variable, ensuring that specific subgroups are represented (default is NULL).
weights: Weights used for the direct estimation from data_model and indirect estimation from data_proj.
split_ratio: Proportion of data used for training (default is 0.8, meaning 80 percent for training and 20 percent for validation).
feature_selection: Selection of predictor variables (default is TRUE)