Learn R Programming

SCE (version 1.1.0)

RFE_SCE: Recursive Feature Elimination for SCE Models

Description

This function implements Recursive Feature Elimination (RFE) to identify the most important predictors for SCE models. It iteratively removes the least important predictors based on Wilks' feature importance scores and evaluates model performance. The function supports both single and multiple predictants, with comprehensive input validation and performance tracking across iterations.

The package also provides a Plot_RFE function for visualizing RFE results, showing validation and testing R2 values as a function of the number of predictors.

Usage

RFE_SCE(
  Training_data,
  Testing_data,
  Predictors,
  Predictant,
  Nmin,
  Ntree,
  alpha = 0.05,
  resolution = 1000,
  step = 1,
  verbose = TRUE,
  parallel = TRUE
)

Plot_RFE( rfe_result, main = "Validation and Testing R2 vs Number of Predictors", col_validation = "blue", col_testing = "red", pch = 16, lwd = 2, cex = 1.2, legend_pos = "bottomleft", ... )

Value

RFE_SCE: A list containing:

  • summary: Data.frame with columns:

    • n_predictors: Number of predictors at each iteration

    • predictors: Comma-separated list of predictors used

  • performances: List of performance evaluations for each iteration

    • For single predictant: Direct performance data.frame

    • For multiple predictants: Named list of performance data.frames

  • importance_scores: List of Wilks' importance scores for each iteration

Plot_RFE: Invisibly returns a list containing:

  • n_predictors: Vector of predictor counts

  • validation_r2: Vector of validation R2 values

  • testing_r2: Vector of testing R2 values

Arguments

Training_data

A data.frame containing the training data. Must include all specified predictors and predictants.

Testing_data

A data.frame containing the testing data. Must include all specified predictors and predictants.

Predictors

A character vector specifying the names of independent variables to be evaluated (e.g., c("Prcp","SRad","Tmax")). Must contain at least 2 elements.

Predictant

A character vector specifying the name(s) of dependent variable(s) (e.g., c("swvl3","swvl4")). Must be non-empty.

Nmin

Integer specifying the minimal number of samples in a leaf node for cutting.

Ntree

Integer specifying the number of trees in the ensemble.

alpha

Numeric significance level for clustering, between 0 and 1. Default value is 0.05.

resolution

Numeric value specifying the resolution for splitting. Default value is 1000.

step

Integer specifying the number of predictors to remove at each iteration. Must be between 1 and (number of predictors - number of predictants). Default value is 1.

verbose

A logical value indicating whether to print progress information during RFE iterations. Default value is TRUE.

parallel

A logical value indicating whether to use parallel processing for SCE model construction. When TRUE, uses multiple CPU cores for faster computation. When FALSE, processes trees sequentially. Default value is TRUE.

Plot_RFE Arguments:

rfe_result

The result object from RFE_SCE function containing summary and performances components.

main

Title for the plot. Default is "Validation and Testing R2 vs Number of Predictors".

col_validation

Color for validation line. Default is "blue".

col_testing

Color for testing line. Default is "red".

pch

Point character for markers. Default is 16 (filled circle).

lwd

Line width. Default is 2.

cex

Point size. Default is 1.2.

legend_pos

Position of legend. Default is "bottomleft".

...

Additional arguments passed to plot function.

Author

Kailong Li <lkl98509509@gmail.com>

Details

RFE_SCE Process: The RFE process involves the following steps:

  1. Input validation:

    • Data frame structure validation

    • Predictor and predictant validation

    • Step size validation

  2. Initialization:

    • Set up history tracking structures

    • Initialize current predictor set

  3. Main RFE loop (continues while predictors > predictants + 2):

    • Train SCE model with current predictors

    • Generate predictions using Model_simulation

    • Evaluate model using SCE_Model_evaluation

    • Store performance metrics and importance scores

    • Remove least important predictors based on Wilks' scores

The function handles:

  • Single and multiple predictants

  • Performance tracking across iterations

  • Importance score calculation

  • Step-wise predictor removal

Plot_RFE Function: Creates a base R plot showing validation and testing R2 values as a function of the number of predictors during the RFE process. The function:

  • Extracts R2 values from RFE results

  • Converts formatted strings to numeric values

  • Creates a line plot with points and lines

  • Includes a legend distinguishing validation and testing performance

  • Supports customization of colors, line styles, and plot appearance

  • Uses only base R graphics (no external dependencies)

See Also

See the generic functions importance and evaluate for SCE objects. For visualization of RFE results, see Plot_RFE.

Examples

Run this code
# \donttest{
#   # This example is computationally intensive and may take a long time to run.
#   # It is recommended to run this example on a machine with a high-performance CPU.
# 
#   ## Load SCE package and the supporting packages
#   library(SCE)
#   library(parallel)
# 
#   data(Streamflow_training_22var)
#   data(Streamflow_testing_22var)
# 
#   # Define predictors and predictants
#   Predictors <- c(
#     "Precipitation", "Radiation", "Tmax", "Tmin", "VP",
#     "Precipitation_2Mon", "Radiation_2Mon", "Tmax_2Mon", "Tmin_2Mon", "VP_2Mon",
#     "PNA", "Nino3.4", "IPO", "PDO",
#     "PNA_lag1", "Nino3.4_lag1", "IPO_lag1", "PDO_lag1",
#     "PNA_lag2", "Nino3.4_lag2", "IPO_lag2", "PDO_lag2"
#   )
#   Predictants <- c("Flow")
# 
#   # Perform RFE
#   set.seed(123)
#   result <- RFE_SCE(
#     Training_data = Streamflow_training_22var,
#     Testing_data = Streamflow_testing_22var,
#     Predictors = Predictors,
#     Predictant = Predictants,
#     Nmin = 5,
#     Ntree = 48,
#     alpha = 0.05,
#     resolution = 1000,
#     step = 3,  # Number of predictors to remove at each iteration
#     verbose = TRUE,
#     parallel = TRUE
#   )
#
#   ## Access results
#   summary <- result$summary
#   performances <- result$performances
#   importance_scores <- result$importance_scores
#
#   ## Plot RFE results
#   Plot_RFE(result)
#
#   ## Customized plot
#   Plot_RFE(result, 
#            main = "My RFE Results",
#            col_validation = "darkblue",
#            col_testing = "darkred",
#            lwd = 3,
#            cex = 1.5)
#
#   ## Note: The RFE_SCE function internally uses S3 methods for SCE models
#   ## including importance() and evaluate() for model analysis
# 
# # }

Run the code above in your browser using DataLab