MachineShop (version 3.7.0)

t.test: Paired t-Tests for Model Comparisons

Description

Paired t-test comparisons of resampled performance metrics from different models.

Usage

# S3 method for PerformanceDiff
t.test(x, adjust = "holm", ...)

Value

PerformanceDiffTest class object that inherits from array. p-values and mean differences are contained in the lower and upper triangular portions, respectively, of the first two dimensions. Model pairs are contained in the third dimension.

Arguments

x

performance difference result.

adjust

p-value adjustment for multiple statistical comparisons as implemented by p.adjust.

...

arguments passed to other methods.

Details

The t-test statistic for pairwise model differences of \(R\) resampled performance metric values is calculated as $$ t = \frac{\bar{x}_R}{\sqrt{F s^2_R / R}}, $$ where \(\bar{x}_R\) and \(s^2_R\) are the sample mean and variance. Statistical testing for a mean difference is then performed by comparing \(t\) to a \(t_{R-1}\) null distribution. The sample variance in the t statistic is known to underestimate the true variances of cross-validation mean estimators. Underestimation of these variances will lead to increased probabilities of false-positive statistical conclusions. Thus, an additional factor \(F\) is included in the t statistic to allow for variance corrections. A correction of \(F = 1 + K / (K - 1)\) was found by Nadeau and Bengio (2003) to be a good choice for cross-validation with \(K\) folds and is thus used for that resampling method. The extension of this correction by Bouchaert and Frank (2004) to \(F = 1 + T K / (K - 1)\) is used for cross-validation with \(K\) folds repeated \(T\) times. For other resampling methods \(F = 1\).

References

Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–81.

Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H. Dai, R. Srikant, & C. Zhang (Eds.), Advances in knowledge discovery and data mining (pp. 3–12). Springer.

Examples

Run this code
# \donttest{
## Requires prior installation of suggested package gbm to run

## Numeric response example
fo <- sale_amount ~ .
control <- CVControl()

gbm_res1 <- resample(fo, ICHomes, GBMModel(n.trees = 25), control)
gbm_res2 <- resample(fo, ICHomes, GBMModel(n.trees = 50), control)
gbm_res3 <- resample(fo, ICHomes, GBMModel(n.trees = 100), control)

res <- c(GBM1 = gbm_res1, GBM2 = gbm_res2, GBM3 = gbm_res3)
res_diff <- diff(res)
t.test(res_diff)
# }

Run the code above in your browser using DataLab