rfPermute: Estimate Permutation p-values for Random Forest Importance Metrics

Description

Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed.

Usage

rfPermute(x, ...)
# S3 method for default
rfPermute(x, y, ..., nrep = 100, num.cores = 1)
# S3 method for formula
rfPermute(formula, data = NULL, ..., subset, na.action = na.fail, nrep = 100)

Arguments

x, y, formula, data, subset, na.action, …

See randomForest for definitions.

nrep

Number of permutation replicates to run to construct null distribution and calculate p-values (default = 100).

num.cores

Number of CPUs to distribute permutation results over. Defaults to NULL which uses one fewer than the number of cores reported by detectCores.

Value

An rfPermute object which contains all of the components of a randomForest object plus:

null.dist

A list containing two three-dimensional arrays of null distributions for unscaled and scaled importance measures.

pval

A three dimensional array containing permutation p-values for unscaled and scaled importance measures.

Details

All other parameters are as defined in randomForest.formula. A Random Forest model is first created as normal to calculate the observed values of variable importance. The response variable is then permuted nrep times, with a new Random Forest model built for each permutation step.

Examples

Run this code

# NOT RUN {
# A regression model using the ozone example
data(airquality)
ozone.rfP <- rfPermute(
  Ozone ~ ., data = airquality, ntree = 100, 
  na.action = na.omit, nrep = 50, num.cores = 1
)
  
# Plot the null distributions and observed values.
plotNull(ozone.rfP) 
  
# Plot the unscaled importance distributions and highlight significant predictors
plot(rp.importance(ozone.rfP, scale = FALSE))
  
# ... and the scaled measures
plot(rp.importance(ozone.rfP, scale = TRUE))

# }

Run the code above in your browser using DataLab