Learn R Programming

ScottKnottESD (v2.0.3)

The Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with non-negligible difference [Tantithamthavorn et al., (2018) http://dx.doi.org/10.1109/TSE.2018.2794977]. It is an alternative approach of the Scott-Knott test that considers the magnitude of the difference (i.e., effect size) of treatment means with-in a group and between groups. Therefore, the Scott-Knott ESD test (v2.x) produces the ranking of treatment means while ensuring that (1) the magnitude of the difference for all of the treatments in each group is negligible; and (2) the magnitude of the difference of treatments between groups is non-negligible.

The mechanism of the Scott-Knott ESD test (v2.0.3) is made up of 2 steps:

  • (Step 1) Find a partition that maximizes treatment means between groups. We begin by sorting the treatment means. Then, following the original Scott-Knott test, we compute the sum of squares between groups (i.e., a dispersion measure of data points) to identify a partition that maximizes treatment means between groups.
  • (Step 2) Splitting into two groups or merging into one group. Instead of using a likelihood ratio test and a Chi-square distribution as a splitting and merging criterion (i.e., a hypothesis testing of the equality of all treatment means), we analyze the magnitude of the difference for each pair for all of the treatment means of the two groups. If there is any one pair of treatment means of two groups are non-negligible, we split into two groups. Otherwise, we merge into one group. We use the Cohen effect size --- an effect size estimate based on the difference between the two means divided by the standard deviation of the two treatment means (d = (mean(x_1) - mean(x_2))/s.d.).

Unlike the earlier version of the Scott-Knott ESD test (v1.x) that post-processes the groups that are produced by the Scott-Knott test, the Scott-Knott ESD test (v2.x) pre-processes the groups by merging pairs of statistically distinct groups that have a negligible difference.

Example usage scenarios in software engineering domain.

(1) Ranking and identifying the most influential variables that are produced by random forests models or regression models.

(2) Ranking and identifying the top-performing feature selection, classification, and model validation techniques for defect prediction models.

(3) Ranking and identifying the most frequent developer search tasks.

Installation

Install the current release from CRAN::
install.packages("ScottKnottESD")
Install the development version from GitHub:
install.packages("devtools")
devtools::install_github("klainfo/ScottKnottESD", ref="development")

Example Usage

library(ScottKnottESD)

# An example dataset: The 1,000 variable importance scores of 9 software metrics. 
# The scores are generated by the Random Forests technique using 1,000 out-of-sample bootstrap.
example

sk <- sk_esd(example)
plot(sk)

sk <- sk_esd(maven)
plot(sk)

Referencing ScottKnottESD

ScottKnottESD can be referenced as:

@article{tantithamthavorn2017mvt,
    Author={Tantithamthavorn, Chakkrit and McIntosh, Shane and Hassan, Ahmed E. and Matsumoto, Kenichi},
    Title = {An Empirical Comparison of Model Validation Techniques for Defect Prediction Models},
    Booktitle = {IEEE Transactions on Software Engineering (TSE)},
    Volumn = {43},
    Number = {1},
    page = {1-18},
    Year = {2017}
}
@article{tantithamthavorn2018optimization,
    Author={Tantithamthavorn, Chakkrit and McIntosh, Shane and Hassan, Ahmed E. and Matsumoto, Kenichi},
    Title = {The Impact of Automated Parameter Optimization for Defect Prediction Models},
    Booktitle = {IEEE Transactions on Software Engineering (TSE)},
    page = {Early Access},
    Year = {2018}
}

Copy Link

Version

Install

install.packages('ScottKnottESD')

Monthly Downloads

289

Version

2.0.3

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Chakkrit Tantithamthavorn

Last Published

May 8th, 2018

Functions in ScottKnottESD (2.0.3)

maven

An example dataset of Breiman's variable importance scores
example

An example dataset of Breiman's variable importance scores
"normalize"

Normalize non-normal distributions using the Box-Cox Power Transformation
"long2wide"

Convert data from long format to wide format
ScottKnottESD-package

The Scott-Knott Effect Size Difference (ESD) Test
print.sk_esd

Print sk_esd objects
sk_esd

A function to check the magnitude of the difference for all pairs of treatments
"check.ANOVA.assumptions"

Check basic ANOVA assumptions