Learn R Programming

KODAMA (version 1.5)

frequency_matching: Frequency Matching

Description

A method to select unbalanced groupd in a cohort.

Usage

frequency_matching (data,label,times=5,seed=1234)

Arguments

data

a data.frame of data.

label

a classification of the groups.

times

The ratio between the two groups.

seed

a single number for random number generation.

Value

The function returns a list with 2 items or 4 items (if a test data set is present):

data

the data after the frequency matching.

label

the label after the frequency matching.

selection

the rows selected for the frequency matching.

References

Cacciatore S, Luchinat C, Tenori L Knowledge discovery by accuracy maximization. Proc Natl Acad Sci U S A 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link

Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA KODAMA: an updated R package for knowledge discovery and data mining. Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link

Examples

Run this code
# NOT RUN {
data(clinical)

A=categorical_table("Gender",clinical[,"Gender"],clinical[,"Hospital"])
B=categorical_table("Gleason score",clinical[,"Gleason score"],clinical[,"Hospital"])
C=categorical_table("Ethnicity",clinical[,"Ethnicity"],clinical[,"Hospital"])

D=continuous_table("BMI",clinical[,"BMI"],clinical[,"Hospital"],digits=2)
E=continuous_table("Age",clinical[,"Age"],clinical[,"Hospital"],digits=1)

# Analysis without matching
rbind(A,B,C,D,E)



# The order is important. Right is more important than left in the vector
# So, Ethnicity will be more important than Age
var=c("Age","BMI","Gleason score" ,"Ethnicity")
t=frequency_matching(clinical[,var],clinical[,"Hospital"],times=1)

newdata=clinical[t$selection,]

A=categorical_table("Gender",newdata[,"Gender"],newdata[,"Hospital"])
B=categorical_table("Gleason score",newdata[,"Gleason score"],newdata[,"Hospital"])
C=categorical_table("Ethnicity",newdata[,"Ethnicity"],newdata[,"Hospital"])

D=continuous_table("BMI",newdata[,"BMI"],newdata[,"Hospital"],digits=2)
E=continuous_table("Age",newdata[,"Age"],newdata[,"Hospital"],digits=1)

# Analysis with matching
rbind(A,B,C,D,E)

# }

Run the code above in your browser using DataLab