Learn R Programming

sae.projection (version 0.1.4)

projection_rf: Projection RF Function

Description

This function trains a random forest model and performs domain-level estimation **without bias correction**.

Usage

projection_rf(
  data_model,
  target_column,
  predictor_cols,
  data_proj,
  domain1,
  domain2,
  psu,
  ssu = NULL,
  strata = NULL,
  weights,
  split_ratio = 0.8,
  feature_selection = TRUE
)

Value

A list containing the following elements:

  • model The trained Random Forest model.

  • importance Feature importance showing which features contributed most to the model's predictions.

  • train_accuracy Accuracy of the model on the training set.

  • validation_accuracy Accuracy of the model on the validation set.

  • validation_performance Confusion matrix for the validation set, showing performance metrics like accuracy, precision, recall, etc.

  • data_proj The projection data with predicted values.

  • Domain1 Estimations for Domain 1, including estimated values, variance, and relative standard error.

  • Domain2 Estimations for Domain 2, including estimated values, variance, and relative standard error.

Arguments

data_model

The training dataset, consisting of auxiliary variables and the target variable.

target_column

The name of the target column in the data_model.

predictor_cols

A vector of predictor column names.

data_proj

The data for projection (prediction), which needs to be projected using the trained model. It must contain the same auxiliary variables as the data_model

domain1

Domain variables for survey estimation (e.g., "province")

domain2

Domain variables for survey estimation (e.g., "regency")

psu

Primary sampling units, representing the structure of the sampling frame.

ssu

Secondary sampling units, representing the structure of the sampling frame (default is NULL).

strata

Stratification variable, ensuring that specific subgroups are represented (default is NULL).

weights

Weights used for the direct estimation from data_model and indirect estimation from data_proj.

split_ratio

Proportion of data used for training (default is 0.8, meaning 80 percent for training and 20 percent for validation).

feature_selection

Selection of predictor variables (default is TRUE)