- object
A fitted model object (e.g., a "randomForest"
object).
- ...
Additional optional arguments. (Currently ignored.)
- feature_names
Character string giving the names of the predictor
variables (i.e., features) of interest. If NULL
(the default) then the
internal `get_feature_names()` function will be called to try and extract
them automatically. It is good practice to always specify this argument.
- train
A matrix-like R object (e.g., a data frame or matrix)
containing the training data. If NULL
(the default) then the
internal `get_training_data()` function will be called to try and extract it
automatically. It is good practice to always specify this argument.
- target
Either a character string giving the name (or position) of the
target column in train
or, if train
only contains feature
columns, a vector containing the target values used to train object
.
- metric
Either a function or character string specifying the
performance metric to use in computing model performance (e.g., RMSE for
regression or accuracy for binary classification). If metric
is a
function, then it requires two arguments, actual
and predicted
,
and should return a single, numeric value. Ideally, this should be the same
metric that was used to train object
. See list_metrics
for a list of built-in metrics.
- smaller_is_better
Logical indicating whether or not a smaller value
of metric
is better. Default is NULL
. Must be supplied if
metric
is a user-supplied function.
- type
Character string specifying how to compare the baseline and
permuted performance metrics. Current options are "difference"
(the
default) and "ratio"
.
- nsim
Integer specifying the number of Monte Carlo replications to
perform. Default is 1. If nsim > 1
, the results from each replication
are simply averaged together (the standard deviation will also be returned).
- keep
Logical indicating whether or not to keep the individual
permutation scores for all nsim
repetitions. If TRUE
(the
default) then the individual variable importance scores will be stored in an
attribute called "raw_scores"
. (Only used when nsim > 1
.)
- sample_size
Integer specifying the size of the random sample to use
for each Monte Carlo repetition. Default is NULL
(i.e., use all of the
available training data). Cannot be specified with sample_frac
. Can be
used to reduce computation time with large data sets.
- sample_frac
Proportion specifying the size of the random sample to use
for each Monte Carlo repetition. Default is NULL
(i.e., use all of the
available training data). Cannot be specified with sample_size
. Can be
used to reduce computation time with large data sets.
- reference_class
Character string specifying which response category
represents the "reference" class (i.e., the class for which the predicted
class probabilities correspond to). Only needed for binary classification
problems.
- pred_fun
Deprecated. Use pred_wrapper
instead.
- pred_wrapper
Prediction function that requires two arguments,
object
and newdata
. The output of this function should be
determined by the metric
being used:
- Regression
A numeric vector of predicted outcomes.
- Binary classification
A vector of predicted class labels (e.g., if
using misclassification error) or a vector of predicted class probabilities
for the reference class (e.g., if using log loss or AUC).
- Multiclass classification
A vector of predicted class labels (e.g.,
if using misclassification error) or a A matrix/data frame of predicted
class probabilities for each class (e.g., if using log loss or AUC).