This function calculates the importance of independent variables in explaining the variability of dependent variables using the Wilks' Lambda statistic. The importance is calculated based on the contribution of each variable to the reduction in Wilks' Lambda at each split in the SCA trees. The function supports both unweighted and OOB-weighted importance calculations.
For calculating importance scores for a single SCA tree, use SCA_importance
instead.
Wilks_importance(model, OOB_weight = TRUE)
A data.frame containing:
Predictor: Names of the predictors
Relative_Importance: Normalized importance scores (sum to 1)
A trained SCE model object containing a list of SCA trees. Each tree should contain:
Tree: Tree structure with Wilks' Lambda values and split information
XName: Names of predictors used
weight: Tree weight (if OOB_weight = TRUE)
A logical value indicating whether to weight the importance scores by the tree's OOB performance.
If TRUE (default): Importance scores are weighted by each tree's OOB performance
If FALSE: Importance scores are calculated using the median across trees
Kailong Li <lkl98509509@gmail.com>
The importance calculation process involves the following steps:
Extract Wilks' Lambda values and split information from each tree
Replace negative Wilks' Lambda values with zero
Calculate raw importance for each split:
Importance = (left_samples + right_samples) / total_samples * (1 - Wilks' Lambda)
Aggregate importance scores by predictor:
If OOB_weight = TRUE: Weight by tree's OOB performance and sum
If OOB_weight = FALSE: Take median across trees
Normalize importance scores to sum to 1
The function handles:
Multiple trees in the ensemble
Different sets of predictors in each tree
Missing or invalid splits
Both single and multiple predictants
Trees with no splits (returns NULL for those trees)
Relationship with SCA_importance:
Wilks_importance
calculates importance scores across all trees in an SCE ensemble
SCA_importance
calculates importance scores for a single SCA tree
Both functions use the same underlying importance calculation method
Wilks_importance
with OOB_weight=FALSE is equivalent to taking the median of SCA_importance
scores across all trees
Li, Kailong, Guohe Huang, and Brian Baetz. "Development of a Wilks feature importance method with improved variable rankings for supporting hydrological inference and modelling." Hydrology and Earth System Sciences 25.9 (2021): 4947-4966.
SCE