"Bonferroni-Dunn" test or the "Nemenyi" test.
"Bonferroni-Dunn" usually yields higher power as it does not
compare all algorithms to each other, but all algorithms to a
baseline instead.
Learners are drawn on the y-axis according to their average rank.
For test = "nemenyi" a bar is drawn, connecting all groups of not
significantly different learners.
For test = "bd" an interval is drawn arround the algorithm selected
as baseline. All learners within this interval are not signifcantly different
from the baseline.
Calculation:
$$CD = q_{\alpha} \sqrt{(\frac{k(k+1)}{6N})}$$
Where $q_\alpha$ is based on the studentized range statistic.
See references for details.generateCritDifferencesData(bmr, measure = NULL, p.value = 0.05,
baseline = NULL, test = "bd")critDifferencesData]. List containing:data.frame] containing the info for the descriptive
part of the plotlist] of class pairwise.htest
contains the calculated
posthoc.friedman.nemenyi.testlist] containing info on the critical difference
and its positioningbaseline chosen for plottingBenchmarkResult,
benchmark,
convertBMRToRankMatrix,
friedmanPostHocTestBMR,
friedmanTestBMR,
generateBenchmarkSummaryData,
generateRankMatrixAsBarData,
getBMRAggrPerformances,
getBMRFeatSelResults,
getBMRFilteredFeatures,
getBMRLearnerIds,
getBMRLearners,
getBMRMeasureIds,
getBMRMeasures,
getBMRPerformances,
getBMRPredictions,
getBMRTaskIds,
getBMRTuneResults,
plotBenchmarkResult,
plotBenchmarkSummary,
plotCritDifferences,
plotRankMatrixAsBarOther generate_plot_data: generateBenchmarkSummaryData,
generateCalibrationData,
generateFilterValuesData,
generateLearningCurveData,
generatePartialPredictionData,
generateROCRCurvesData,
generateRankMatrixAsBarData,
generateThreshVsPerfData,
getFilterValues