model.lrt: likelihood ratio test for two models

Description

Conduct likelihood ratio test for comparing two different models.

Usage

model.lrt(d1, d2, parallel = FALSE)

Arguments

An object of class 'dis.kstest'.

parallel

Whether to use multiple threads to parallelize computation. Default is FALSE. Please aware that it may take longer time to execute the program with parallel=FALSE.

Value

The p-value of the likelihood ratio test.

Reference

H. Aldirawi, J. Yang, A. A. Metwally (2019). Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data, accepted for publication in 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).
T. Wolodzko (2019). extraDistr: Additional Univariate and Multivariate Distributions, R package version 1.8.11, https://CRAN.R-project.org/package=extraDistr.
R. Calaway, Microsoft Corporation, S. Weston, D. Tenenbaum (2017). doParallel: Foreach Parallel Adaptor for the 'parallel' Package, R package version 1.0.11, https://CRAN.R-project.org/package=doParallel.
R. Calaway, Microsoft, S. Weston (2017). foreach: Provides Foreach Looping Construct for R, R package version 1.4.4, https://CRAN.R-project.org/package=foreach.

Details

If the pvalue of d1 and d2 are greater than the user-specified significance level, which indicates that the original data x may come from the two distributions in d1 and d2, a likelihood ratio test is desired to choose a more 'possible' distribution based on the current data. NOTE that the x in d1 and d2 must be IDENTICAL! Besides, NOTE that the distri in d1 and d2 must be DIFFERENT!

The distri inherited from d1 is the null distribution and that from d2 is used as the alternative distribution. Following Aldirawi et al. (2019), nsim bootstrapped or simulated samples will be generated according to bootstrap of d1, based on which nsim maximum likelihood estimates of the parameters in null distribution will be calculated. Remember that we have obtained nsim such maximum likelihood estimates while calling function dis.kstest. Thus, the algorithm just adopts the mle_new from d1 to avoid repetitive work. Using the nsim maximum likelihood estimates to generate nsim new samples and calculate nsim corresponding new likelihood ratio test statistics. The output p-value is the proportion of new samples that have statistics greater than the test statistic of the original data x.

As in dis.kstest, the computation is parallelized with the help of packages foreach and doParallel.

With the output p-value smaller than the user-specified significance level, the distri of d2 is more appropriate for modelling x. Otherwise, There is no significant difference between distri of d1 and distri of d2, given the current data.

Examples

Run this code

# NOT RUN {
set.seed(2001)
temp1=sample.zi(N=300,phi=0.3,distri='poisson',lambda=5)
d1=dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zip')
d2=dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zinb')
model.lrt(d1,d2)
# }

Run the code above in your browser using DataLab