
Last chance! 50% off unlimited learning
Sale ends in
performanceEstimation()
function to
compare a set of workflows over a set of problems you obtain estimates
of their performances across these problems. This function implements
several statistical tests that can be used to test several hypothesis
concerning the observed differences in performance between the
workflows on the tasks.
pairedComparisons(obj,baseline, maxs=rep(FALSE,length(metricNames(obj))), p.value=0.05)
ComparisonResults
that contains the results of a performance estimation experiment.
TRUE
value means the respective
metric is to be maximized, while a FALSE
means
minimization. Defaults to all FALSE
values, i.e. all metrics are to
be minimized.
performanceEstimation
function allows you to obtain
estimates of the expected value of a series of performance metrics for
a set of alternative workflows and a set of predictive tasks. After
running this type of experiments we frequently want to check if there
is any statistical significance between the estimated performance of
the different workflows. The current function allows you to carry out
this type of checks.The function will only run on experiments containing more than one workflow as paired comparisons do not make sense with one single alternative workflow. Having more than one workflow we can distinguish two situations: i) comparing the performance of two workflows; or ii) comparisons among multiple workflows. The recommendations for checking the statistical significance of the difference between the performance of the alternative workflows varies within these two setups (see Demsar (2006) for recommendations).
The current function implements several statistical tests that can be
used for different hypothesis tests. Namely, it obtains
the results of paired t tests and paired Wilcoxon Signed
Rank tests for situations where you are comparing the performance of
two workflows, with the latter being recommended given the typical
overlap among the training sets that does not ensure independence
among the scores of the different iterations. For the setup of
multiple workflows on multiple tasks the function also calculates the Friedman test
and the post-hoc Nemenyi and Bonferroni-Dunn tests,
according to the procedures described in Demsar (2006). The
combination Friedman test followed by the post-hoc
Nemenyi test is recommended when you want to carry out paired
comparisons between all alternative workflows on the set of tasks to
check for which differences are significant. The combination
Friedman test followed by the post-hoc Bonferroni-Dunn
test is recommended when you want to compare a set of alternative
workflows against a baseline workflow. For both of these two paths we
provide an implementation of the diagrams (CD diagrams) described in
Demsar (2006) through the functions CDdiagram.BD
and
CDdiagram.Nemenyi
.
The performanceEstimation
function ensures that all
compared workflows are run on exactly the same train+test partitions
on all repetitions and for all predictive tasks. In this context, we
can use pairwise statistical significance tests.
CDdiagram.Nemenyi
,
CDdiagram.BD
,
signifDiffs
,
performanceEstimation
,
topPerformers
,
topPerformer
,
rankWorkflows
,
metricsSummary
,
ComparisonResults
## Not run:
# ## Estimating MSE for 3 variants of both
# ## regression trees and SVMs, on two data sets, using one repetition
# ## of 10-fold CV
# library(e1071)
# data(iris)
# data(Satellite,package="mlbench")
# data(LetterRecognition,package="mlbench")
#
#
# ## running the estimation experiment
# res <- performanceEstimation(
# c(PredTask(Species ~ .,iris),
# PredTask(classes ~ .,Satellite,"sat"),
# PredTask(lettr ~ .,LetterRecognition,"letter")),
# workflowVariants(learner="svm",
# learner.pars=list(cost=1:4,gamma=c(0.1,0.01))),
# EstimationTask(metrics=c("err","acc"),method=CV()))
#
#
# ## checking the top performers
# topPerformers(res)
#
# ## now let us assume that we will choose "svm.v2" as our baseline
# ## carry out the paired comparisons
# pres <- pairedComparisons(res,"svm.v2")
#
# ## obtaining a CD diagram comparing all others against svm.v2 in terms
# ## of error rate
# CDdiagram.BD(pres,metric="err")
#
# ## End(Not run)
Run the code above in your browser using DataLab