postResample(pred, obs)
defaultSummary(data, lev = NULL, model = NULL)twoClassSummary(data, lev = NULL, model = NULL)
mnLogLoss(data, lev = NULL, model = NULL)
multiClassSummary(data, lev = NULL, model = NULL)
R2(pred, obs, formula = "corr", na.rm = FALSE)
RMSE(pred, obs, na.rm = FALSE)
getTrainPerf(x)
obs
and pred
for the observed and predicted outcomes. For twoClassSummary
, columns should also
include predicted probabilities for each class. See the classProbs
NULL
.method
argument of train
.NA
values should be stripped before the computation proceeds.train
postResample
is meant to be used with apply
across a matrix. For numeric data
the code checks to see if the standard deviation of either vector is zero. If so, the correlation
between those samples is assigned a value of zero. NA
values are ignored everywhere.Note that many models have more predictors (or parameters) than data points, so the typical mean squared
error denominator (n - p) does not apply. Root mean squared error is calculated using sqrt(mean((pred - obs)^2
.
Also, $R^2$ is calculated wither using as the square of the correlation between the observed and predicted outcomes when form = "corr"
. when form = "traditional"
,
$$R^2 = 1-\frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y}_i)^2}$$
For defaultSummary
is the default function to compute performance metrics in train
. It is a wrapper around postResample
.
twoClassSummary
computes sensitivity, specificity and the area under the ROC curve. mnLogLoss
computes the minus log-likelihood of the multinomial distribution (without the constant term):
$$-logLoss = \frac{-1}{n}\sum_{i=1}^n \sum_{j=1}^C y_{ij} \log(p_{ij})$$
where the y
values are binary indicators for the classes and p
are the predicted class probabilities.
multiClassSummary
computes some overall measures of for performance (e.g. overall accuracy and the Kappa statistic) and several averages of statistics calculated from "one-versus-all" configurations. For example, if there are three classes, three sets of sensitivity values are determined and the average is reported with the name ("Mean_Sensitivity"). The same is true for a number of statistics generated by confusionMatrix
. With two classes, the basic sensitivity is reported with the name "Sensitivity"
To use twoClassSummary
and/or mnLogLoss
, the classProbs
argument of trainControl
should be TRUE
. multiClassSummary
can be used without class probabilities but some statistics (e.g. overall log loss and the average of per-class area under the ROC curves) will not be in the result set.
Other functions can be used via the summaryFunction
argument of trainControl
. Custom functions must have the same arguments asdefaultSummary
.
The function getTrainPerf
returns a one row data frame with the resampling results for the chosen model. The statistics will have the prefix "Train
" (i.e. "TrainROC
"). There is also a column called "method
" that echoes the argument of the call to trainControl
of the same name.
trainControl
predicted <- matrix(rnorm(50), ncol = 5)
observed <- rnorm(10)
apply(predicted, 2, postResample, obs = observed)
classes <- c("class1", "class2")
set.seed(1)
dat <- data.frame(obs = factor(sample(classes, 50, replace = TRUE)),
pred = factor(sample(classes, 50, replace = TRUE)),
class1 = runif(50), class2 = runif(50))
defaultSummary(dat, lev = classes)
twoClassSummary(dat, lev = classes)
mnLogLoss(dat, lev = classes)
Run the code above in your browser using DataLab