postResample
is meant to be used with apply
across a matrix. For numeric data
the code checks to see if the standard deviation of either vector is zero. If so, the correlation
between those samples is assigned a value of zero. NA
values are ignored everywhere.Note that many models have more predictors (or parameters) than data points, so the typical mean squared
error denominator (n - p) does not apply. Root mean squared error is calculated using sqrt(mean((pred - obs)^2
.
Also, $R^2$ is calculated wither using as the square of the correlation between the observed and predicted outcomes when form = "corr"
. when form = "traditional"
,
$$R^2 = 1-\frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y}_i)^2}$$
For defaultSummary
is the default function to compute performance metrics in train
. It is a wrapper around postResample
.
twoClassSummary
computes sensitivity, specificity and the area under the ROC curve. mnLogLoss
computes the minus log-likelihood of the multinomial distribution (without the constant term):
$$-logLoss = \frac{-1}{n}\sum_{i=1}^n \sum_{j=1}^C y_{ij} \log(p_{ij})$$
where the y
values are binary indicators for the classes and p
are the predicted class probabilities.
To use twoClassSummary
and/or mnLogLoss
, the classProbs
argument of trainControl
should be TRUE
.
Other functions can be used via the summaryFunction
argument of trainControl
. Custom functions must have the same arguments asdefaultSummary
.
The function getTrainPerf
returns a one row data frame with the resampling results for the chosen model. The statistics will have the prefix "Train
" (i.e. "TrainROC
"). There is also a column called "method
" that echoes the argument of the call to trainControl
of the same name.