chi_square_goodness_of_fit_from_input_all_param: Not vetor: The Goodness of Fit (Chi Square) Calculator

Description

Chi square goodness of fit statistics for each MCMC sample with a fixed dataset.

Our data is 2C categories, that is,

the number of hits :h[1], h[2], h[3],...,h[C] and

the number of false alarms: f[1],f[2], f[3],...,f[C].

Our model has C+2 parameters, that is,

the thresholds of the bi normal assumption z[1],z[2],z[3],...,z[C] and

the mean and standard deviation of the signal distribution.

So, the degree of freedom of this statistics is calculated by

2C-(C+2)-1 =C -3.

This differ from Chakraborty's result C-2. Why ?

Usage

chi_square_goodness_of_fit_from_input_all_param(
  h,
  f,
  p,
  lambda,
  NL,
  NI,
  ModifiedPoisson = FALSE,
  dig = 3
)

Arguments

A vector of non-negative integers, indicating the number of hits. The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution. In famous Gelman's book, he explain how to make test statistics in the Bayesian context, and it require the samples from posterior predictive distribution. So, in this variable author substitute the replication data from the posterior predictive distributions.

A vector of non-negative integers, indicating the number of false alarms. The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution. In famous Gelman's book, he explain how to make test statistics in the Bayesian context, and it require the samples from posterior predictive distribution. So, in this variable author substitute the replication data from the posterior predictive distributions.

A vector of non-negative integers, indicating hit rate. A vector whose length is number of confidence levels.

lambda

A vector of non-negative integers, indicating False alarm rate. A vector whose length is number of confidence levels.

An integer, representing Number of Lesions

An integer, representing Number of Images

ModifiedPoisson

Logical, that is TRUE or FALSE.

If ModifiedPoisson = TRUE, then Poisson rate of false alarm is calculated per lesion, and model is fitted so that the FROC curve is an expected curve of points consisting of the pairs of TPF per lesion and FPF per lesion.

Similarly,

If ModifiedPoisson = TRUE, then Poisson rate of false alarm is calculated per image, and model is fitted so that the FROC curve is an expected curve of points consisting of the pair of TPF per lesion and FPF per image.

For more details, see the author's paper in which I explained per image and per lesion. (for details of models, see vignettes , now, it is omiited from this package, because the size of vignettes are large.)

If ModifiedPoisson = TRUE, then the False Positive Fraction (FPF) is defined as follows ($F_c$ denotes the number of false alarms with confidence level $c$ )

$$ \frac{F_1+F_2+F_3+F_4+F_5}{N_L}, $$

$$ \frac{F_2+F_3+F_4+F_5}{N_L}, $$

$$ \frac{F_3+F_4+F_5}{N_L}, $$

$$ \frac{F_4+F_5}{N_L}, $$

$$ \frac{F_5}{N_L}, $$

where $N_L$ is a number of lesions (signal). To emphasize its denominator $N_L$, we also call it the False Positive Fraction (FPF) per lesion.

On the other hand,

if ModifiedPoisson = FALSE (Default), then False Positive Fraction (FPF) is given by

$$ \frac{F_1+F_2+F_3+F_4+F_5}{N_I}, $$

$$ \frac{F_2+F_3+F_4+F_5}{N_I}, $$

$$ \frac{F_3+F_4+F_5}{N_I}, $$

$$ \frac{F_4+F_5}{N_I}, $$

$$ \frac{F_5}{N_I}, $$

where $N_I$ is the number of images (trial). To emphasize its denominator $N_I$, we also call it the False Positive Fraction (FPF) per image.

The model is fitted so that the estimated FROC curve can be ragraded as the expected pairs of FPF per image and TPF per lesion (ModifiedPoisson = FALSE )

or as the expected pairs of FPF per image and TPF per lesion (ModifiedPoisson = TRUE)

If ModifiedPoisson = TRUE, then FROC curve means the expected pair of FPF per lesion and TPF.

On the other hand, if ModifiedPoisson = FALSE, then FROC curve means the expected pair of FPF per image and TPF.

So,data of FPF and TPF are changed thus, a fitted model is also changed whether ModifiedPoisson = TRUE or FALSE. In traditional FROC analysis, it uses only per images (trial). Since we can divide one image into two images or more images, number of trial is not important. And more important is per signal. So, the author also developed FROC theory to consider FROC analysis under per signal. One can see that the FROC curve is rigid with respect to change of a number of images, so, it does not matter whether ModifiedPoisson = TRUE or FALSE. This rigidity of curves means that the number of images is redundant parameter for the FROC trial and thus the author try to exclude it.

Revised 2019 Dec 8 Revised 2019 Nov 25 Revised 2019 August 28

dig

A variable to be passed to the function rstan::sampling() of rstan in which it is named ...??. A positive integer representing the Significant digits, used in stan Cancellation. Default = 5,

Value

A number !! Not list nor dataframe nor vector !! Only A number represent the chi square for your input data.

Details

To calculate the chi square test statistics, the two quantities are needed, that is, data and parameter. In the classical (frequentists) chi square values, as the estimates of parameter, for example, MLE (maximal likelihood estimator) is chosen. In Bayesian sense, the parameter can be taken for all MCMC iterations, that is, parameter is not deterministic and we consider it is a random variable or samples from the posterior distribution. And such samples are obtained in the Hamiltonian Monte Carlo Simulation with the author's Bayesian Model. Thus we can calculate chi square values with MCMC samples.

Examples

Run this code

# NOT RUN {
#  Make a stanfit object (more precisely its inherited S4 class object)

       fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
                           ite = 1111,
                           summary =FALSE,
                           cha = 2)

#   The chi square discrepancies (Goodness of Fit) are calculated
#   by the following code with the posterior mean as a parameter.#


  NI          <-  fit@dataList$NI
  NL          <-  fit@dataList$NL
  f.observed  <-  fit@dataList$f
  h.observed  <-  fit@dataList$h
  C           <-  fit@dataList$C

 # p <-  rstan::get_posterior_mean(fit, par=c("p"))
 #lambda <- rstan::get_posterior_mean(fit, par=c("l"))
# Note that get_posterior_mean is not a number but a matrix when
# Chains is not 1.
# So, instead of it, we use
#

  e     <- extract_EAP_CI(fit,"l",fit@dataList$C )
 lambda <- e$l.EAP

  e <- extract_EAP_CI(fit,"p",fit@dataList$C )
  p <- e$p.EAP

         Chi.Square <-   chi_square_goodness_of_fit_from_input_all_param(

                          h   =   h.observed,
                          f   =   f.observed,
                          p   =   p,
                      lambda  =   lambda,
                          NL  =   NL,
                          NI  =   NI
                               )

#  Get posterior mean of the chi square discrepancy.

                    Chi.Square

# Calculate the p-value for the posterior mean of the chi square discrepancy.

                     stats::pchisq(Chi.Square,df=1)







# }
# NOT RUN {
# dottest

# }

Run the code above in your browser using DataLab