Learn R Programming

SCE (version 1.0.0)

Wilks_importance: Calculate Variable Importance using Wilks' Lambda

Description

This function calculates the importance of independent variables in explaining the variability of dependent variables using the Wilks' Lambda statistic. The importance is calculated based on the contribution of each variable to the reduction in Wilks' Lambda at each split in the SCA trees. The function supports both unweighted and OOB-weighted importance calculations.

For calculating importance scores for a single SCA tree, use SCA_importance instead.

Usage

Wilks_importance(model, OOB_weight = TRUE)

Value

A data.frame containing:

  • Predictor: Names of the predictors

  • Relative_Importance: Normalized importance scores (sum to 1)

Arguments

model

A trained SCE model object containing a list of SCA trees. Each tree should contain:

  • Tree: Tree structure with Wilks' Lambda values and split information

  • XName: Names of predictors used

  • weight: Tree weight (if OOB_weight = TRUE)

OOB_weight

A logical value indicating whether to weight the importance scores by the tree's OOB performance.

  • If TRUE (default): Importance scores are weighted by each tree's OOB performance

  • If FALSE: Importance scores are calculated using the median across trees

Author

Kailong Li <lkl98509509@gmail.com>

Details

The importance calculation process involves the following steps:

  1. Extract Wilks' Lambda values and split information from each tree

  2. Replace negative Wilks' Lambda values with zero

  3. Calculate raw importance for each split:

    • Importance = (left_samples + right_samples) / total_samples * (1 - Wilks' Lambda)

  4. Aggregate importance scores by predictor:

    • If OOB_weight = TRUE: Weight by tree's OOB performance and sum

    • If OOB_weight = FALSE: Take median across trees

  5. Normalize importance scores to sum to 1

The function handles:

  • Multiple trees in the ensemble

  • Different sets of predictors in each tree

  • Missing or invalid splits

  • Both single and multiple predictants

  • Trees with no splits (returns NULL for those trees)

Relationship with SCA_importance:

  • Wilks_importance calculates importance scores across all trees in an SCE ensemble

  • SCA_importance calculates importance scores for a single SCA tree

  • Both functions use the same underlying importance calculation method

  • Wilks_importance with OOB_weight=FALSE is equivalent to taking the median of SCA_importance scores across all trees

References

Li, Kailong, Guohe Huang, and Brian Baetz. "Development of a Wilks feature importance method with improved variable rankings for supporting hydrological inference and modelling." Hydrology and Earth System Sciences 25.9 (2021): 4947-4966.

See Also

SCE