Learn R Programming

SCE (version 1.0.0)

RFE_SCE: Recursive Feature Elimination for SCE Models

Description

This function implements Recursive Feature Elimination (RFE) to identify the most important predictors for SCE models. It iteratively removes the least important predictors based on Wilks' feature importance scores and evaluates model performance. The function supports both single and multiple predictants, with comprehensive input validation and performance tracking across iterations.

Usage

RFE_SCE(
  Training_data,
  Testing_data,
  Predictors,
  Predictant,
  Nmin,
  Ntree,
  alpha = 0.05,
  resolution = 1000,
  step = 1,
  verbose = TRUE,
  parallel = TRUE
)

Value

A list containing:

  • summary: Data.frame with columns:

    • n_predictors: Number of predictors at each iteration

    • predictors: Comma-separated list of predictors used

  • performances: List of performance evaluations for each iteration

    • For single predictant: Direct performance data.frame

    • For multiple predictants: Named list of performance data.frames

  • importance_scores: List of Wilks' importance scores for each iteration

Arguments

Training_data

A data.frame containing the training data. Must include all specified predictors and predictants.

Testing_data

A data.frame containing the testing data. Must include all specified predictors and predictants.

Predictors

A character vector specifying the names of independent variables to be evaluated (e.g., c("Prcp","SRad","Tmax")). Must contain at least 2 elements.

Predictant

A character vector specifying the name(s) of dependent variable(s) (e.g., c("swvl3","swvl4")). Must be non-empty.

Nmin

Integer specifying the minimal number of samples in a leaf node for cutting.

Ntree

Integer specifying the number of trees in the ensemble.

alpha

Numeric significance level for clustering, between 0 and 1. Default value is 0.05.

resolution

Numeric value specifying the resolution for splitting. Default value is 1000.

step

Integer specifying the number of predictors to remove at each iteration. Must be between 1 and (number of predictors - number of predictants). Default value is 1.

verbose

A logical value indicating whether to print progress information during RFE iterations. Default value is TRUE.

parallel

A logical value indicating whether to use parallel processing for SCE model construction. When TRUE, uses multiple CPU cores for faster computation. When FALSE, processes trees sequentially. Default value is TRUE.

Author

Kailong Li <lkl98509509@gmail.com>

Details

The RFE process involves the following steps:

  1. Input validation:

    • Data frame structure validation

    • Predictor and predictant validation

    • Step size validation

  2. Initialization:

    • Set up history tracking structures

    • Initialize current predictor set

  3. Main RFE loop (continues while predictors > predictants + 2):

    • Train SCE model with current predictors

    • Generate predictions using Model_simulation

    • Evaluate model using SCE_Model_evaluation

    • Store performance metrics and importance scores

    • Remove least important predictors based on Wilks' scores

The function handles:

  • Single and multiple predictants

  • Performance tracking across iterations

  • Importance score calculation

  • Step-wise predictor removal

Examples

Run this code
# \donttest{
#   # This example is computationally intensive and may take a long time to run.
#   # It is recommended to run this example on a machine with a high-performance CPU.
# 
#   ## Load SCE package and the supporting packages
#   library(SCE)
#   library(parallel)
# 
#   data(Streamflow_training_22var)
#   data(Streamflow_testing_22var)
# 
#   # Define predictors and predictants
#   Predictors <- c(
#     "Precipitation", "Radiation", "Tmax", "Tmin", "VP",
#     "Precipitation_2Mon", "Radiation_2Mon", "Tmax_2Mon", "Tmin_2Mon", "VP_2Mon",
#     "PNA", "Nino3.4", "IPO", "PDO",
#     "PNA_lag1", "Nino3.4_lag1", "IPO_lag1", "PDO_lag1",
#     "PNA_lag2", "Nino3.4_lag2", "IPO_lag2", "PDO_lag2"
#   )
#   Predictants <- c("Flow")
# 
#   # Perform RFE
#   set.seed(123)
#   result <- RFE_SCE(
#     Training_data = Streamflow_training_22var,
#     Testing_data = Streamflow_testing_22var,
#     Predictors = Predictors,
#     Predictant = Predictants,
#     Nmin = 5,
#     Ntree = 48,
#     alpha = 0.05,
#     resolution = 1000,
#     step = 3,  # Number of predictors to remove at each iteration
#     verbose = TRUE,
#     parallel = TRUE
#   )
# 
#   ## Access results
#   summary <- RFE_results$summary
#   performances <- RFE_results$performances
#   importance_scores <- RFE_results$importance_scores
# 
# # }

Run the code above in your browser using DataLab