Learn R Programming

Empirical Distribution Ordering Inference Framework (EDOIF)

Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode A instead of B to travel? In this work, we developed a framework to solve these problems named "EDOIF".

EDOIF is a nonparametric framework based on "Estimation Statistics" principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of

  1. inferring orders of domination of categories and representing orders in a form of a graph;
  2. estimating magnitude of difference between a pair of categories in forms of mean-difference confidence intervals; and
  3. visualizing domination orders and magnitudes of difference of categories.

Installation

You can install our package from CRAN

install.packages("EDOIF")

For the newest version on github, please call the following command in R terminal.

remotes::install_github("DarkEyes/EDOIF")

This requires a user to install the "remotes" package before installing EDOIF.

Example: Inferring orders of categories based on their empirical distributions

library(EDOIF)

#== simulation: Generating distributuions of five categories: 
# Category5 dominates Category4
# Category4 dominates Category3
# Category3 dominates Category2
# Category2 dominates Category1

nInv=150 # number of samples per categories
initMean=10
stepMean=20
std=8

simData1<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("Category1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("Category2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("Category3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("Category4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("Category5"),times=nInv) )

#== parameter setting
bootT=1000 # number of times of sample with replacement in bootstrap function.
alpha=0.05 # Significance level

#== Calling the class constructor
A1<-EDOIF(simData1$Values,simData1$Group, bootT=bootT, alpha=alpha, methodType ="perc") 

#== Visualizing results
print(A1) # print the results in text mode
plot(A1, fontSize=10) # print the results in graphic mode

Graphic mode results

  1. An alpha-confidence-interval of mean plot for five categories. The horizontal axis represents categories and the vertical axis represents values within distributions of categories.
  1. An alpha-confidence-interval of mean difference plot for five categories.

Text mode results

EDOIF (Empirical Distribution Ordering Inference Framework)
=======================================================
Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
Using Mann-Whitney test to report whether A ≺ B
A dominant-distribution network density:0.900000
Distribution: Category1
Mean:10.840671 95CI:[ 9.706981,12.014179]
Distribution: Category2
Mean:11.044785 95CI:[ 9.806991,12.446037]
Distribution: Category3
Mean:50.462935 95CI:[ 49.208005,51.757706]
Distribution: Category4
Mean:70.299726 95CI:[ 69.103924,71.502505]
Distribution: Category5
Mean:91.190505 95CI:[ 89.895480,92.518455]
=======================================================
Mean difference of Category2 (n=150) minus Category1 (n=150): Category1 ⊀ Category2
 :p-val 0.4463
Mean Diff:0.204114 95CI:[ -1.545130,1.930609]

Mean difference of Category3 (n=150) minus Category1 (n=150): Category1 ≺ Category3
 :p-val 0.0000
Mean Diff:39.622264 95CI:[ 37.984831,41.378232]

Mean difference of Category4 (n=150) minus Category1 (n=150): Category1 ≺ Category4
 :p-val 0.0000
Mean Diff:59.459055 95CI:[ 57.921328,61.127817]

Mean difference of Category5 (n=150) minus Category1 (n=150): Category1 ≺ Category5
 :p-val 0.0000
Mean Diff:80.349835 95CI:[ 78.620391,82.133270]

Mean difference of Category3 (n=150) minus Category2 (n=150): Category2 ≺ Category3
 :p-val 0.0000
Mean Diff:39.418150 95CI:[ 37.543210,41.241722]

Mean difference of Category4 (n=150) minus Category2 (n=150): Category2 ≺ Category4
 :p-val 0.0000
Mean Diff:59.254941 95CI:[ 57.304359,61.098774]

Mean difference of Category5 (n=150) minus Category2 (n=150): Category2 ≺ Category5
 :p-val 0.0000
Mean Diff:80.145720 95CI:[ 78.313321,82.040234]

Mean difference of Category4 (n=150) minus Category3 (n=150): Category3 ≺ Category4
 :p-val 0.0000
Mean Diff:19.836791 95CI:[ 18.047421,21.762239]

Mean difference of Category5 (n=150) minus Category3 (n=150): Category3 ≺ Category5
 :p-val 0.0000
Mean Diff:40.727570 95CI:[ 39.004372,42.627946]

Mean difference of Category5 (n=150) minus Category4 (n=150): Category4 ≺ Category5
 :p-val 0.0000
Mean Diff:20.890780 95CI:[ 19.079287,22.625807]

For more examples, please see the vignettes in this link .

Citation

Amornbunchornvej, Chainarong, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong. "A nonparametric framework for inferring orders of categorical data from category-real pairs." Heliyon 6, no. 11 (2020): e05435, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2020.e05435. arXiv

Contact

Copy Link

Version

Install

install.packages('EDOIF')

Monthly Downloads

586

Version

0.1.3

License

BSD_3_clause + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Chainarong Amornbunchornvej

Last Published

March 28th, 2021

Functions in EDOIF (0.1.3)

getMegDiffConfInv

getMegDiffConfInv function
EDOIF

Empirical Distribution Ordering Inference Framework (EDOIF)
SimMixDist

SimMixDist function
getDominantRADJ

getDominantRADJ function
getConfInv

getConfInv function
SimNonNormalDist

SimNonNormalDist function
getOrder

getOrder function
checkSim3Res

checkSim3Res function
bootDiffmeanFunc

bootDiffmeanFunc function
plotMeanDiffCIs

plotMeanDiffCIs function
getiGraphOBJ

getiGraphOBJ function
plotMeanCIs

plotMeanCIs function
getWilcoxDominantRADJ

getWilcoxDominantRADJ function
plotGraph

plotGraph function
getiGraphNetDen

getiGraphNetDen function
print.EDOIF

print.EDOIF function
getADJNetDen

getADJNetDen function
getttestDominantRADJ

getttestDominantRADJ function
meanBoot

meanBoot function
plot.EDOIF

plot.EDOIF function