50% off | Unlimited Data & AI Learning

Last chance! 50% off unlimited learning

Sale ends in


BayesianFROC (version 0.5.0)

ppp_srsc: Calculates PPP for Models of a single reader and a single modality (Calculation is correct! :'-D)

Description

Calculates Posterior Predictive P value for chi square (goodness of fit)

Appendix: p value

In order to evaluate the goodness of fit of our model to the data, we used the so-called the posterior predictive p value.

In the following, we use general conventional notations. Let yobs be an observed dataset and f(y|θ) be a model (likelihood) for future dataset y. We denote a prior and a posterior distribution by π(θ) and π(θ|y)f(y|θ)π(θ), respectively.

In our case, the data y is a pair of hits and false alarms; that is, y=(H1,H2,HC;F1,F2,FC) and θ=(z1,dz1,dz2,,dzC1,μ,σ). We define the χ2 discrepancy (goodness of fit statistics) to validate that our model fit the data. T(y,θ):=c=1,.......,C((HcNL×pc(θ))2NL×pc(θ)+(Fcqc(θ)×NX)2qc(θ)×NX).

for a single reader and a single modality.

T(y,θ):=r=1Rm=1Mc=1C((Hc,m,rNL×pc,m,r(θ))2NL×pc,m,r(θ)+(Fcqc(θ)×NX)2qc(θ)×NX).

for multiple readers and multiple modalities.

Note that pc and λc depend on θ.

In classical frequentist methods, the parameter θ is a fixed estimate, e.g., the maximal likelihood estimator. However, in a Bayesian context, the parameter is not deterministic. In the following, we show the p value in the Bayesian sense.

Let yobs be an observed dataset (in an FROC context, it is hits and false alarms). Then, the so-called posterior predictive p value is defined by

pvalue=dydθI(T(y,θ)>T(yobs,θ))f(y|θ)π(θ|yobs)

In order to calculate the above integral, let θ1,θ2,.......,θi,.......,θI be samples from the posterior distribution of yobs, namely,

θ1π(....|yobs), ......., θiπ(....|yobs), ......., θIπ(....|yobs).

we obtain a sequence of models (likelihoods), i.e., f(....|θ1),f(....|θ2),.......,f(....|θn). We then draw the samples y11,....,yji,.......,yJI, such that each yji is a sample from the distribution whose density function is f(....|θi), namely,

y11,.......,yj1,.......,yJ1f(....|θ1), ......., y1i,.......,yji,.......,yJif(....|θi), ......., y1I,.......,yjI,.......,yJIf(....|θI).

Using the Monte Carlo integral twice, we calculate the integral of any function ϕ(y,θ).

dydθϕ(y,θ)f(y|θ)π(θ|yobs) 1Ii=1Iϕ(y,θi)f(y|θi)dy 1IJi=1Ij=1Jϕ(yji,θi)

In particular, substituting ϕ(y,θ):=I(T(y,θ)>T(yobs,θ)) into the above equation, we can approximate the posterior predictive p value.

pvalue1IJijI(T(yji,θi)>T(yobs,θi))

Usage

ppp_srsc(
  StanS4class,
  Colour = TRUE,
  dark_theme = TRUE,
  plot = TRUE,
  summary = FALSE,
  plot_data = TRUE,
  replicate.number.from.model.for.each.MCMC.sample = 100
)

Arguments

StanS4class

An S4 object of class stanfitExtended which is an inherited class from the S4 class stanfit. This R object is a fitted model object as a return value of the function fit_Bayesian_FROC().

To be passed to DrawCurves(), ppp() and ... etc

Colour

Logical: TRUE of FALSE. whether Colour of curves is dark theme or not.

dark_theme

TRUE or FALSE

plot

Logical, whether replicated data are drawn, in the following notation, replicated data are denoted by y1,y2,...,yN.

summary

Logical: TRUE of FALSE. Whether to print the verbose summary. If TRUE then verbose summary is printed in the R console. If FALSE, the output is minimal. I regret, this variable name should be verbose.

plot_data

A logical, whether data is plotted in the plot of data synthesized from the posterior predictive distribution I cannot understand what I wrote in the past. My head is crazy cuz I was MCS, head inflammation maybe let me down.

Suppose that θ1,θ2,θ3,...,θN are samples drawn in N times from posterior π(θ|D) of given data D. So, these θi;i=1,2,... are contained in a stanfit object specified as the variable StanS4class.

Let y1,y2,...,yn be samples drawn as the manner

y1likelihood(.|θ1), y2likelihood(.|θ2), y3likelihood(.|θ3), ..., yNlikelihood(.|θN).

We repeat this in J times, namely, we draw the samples yn,j,n=1,..,N;j=1,...,J so that

y1,jlikelihood(.|θ1), y2,jlikelihood(.|θ2), y3,jlikelihood(.|θ3), ..., yn,jlikelihood(.|θn), ..., yN,jlikelihood(.|θN).

Yes, the variable replicate.number.from.model.for.each.MCMC.sample means J! We can write it more explicitly without abbreviation as follows.

y1,1,y1,2,...,y1,j,...,y1,Jlikelihood(.|θ1), y2,1,y2,2,...,y2,j,...,y2,Jlikelihood(.|θ2), y3,1,y3,2,...,y3,j,...,y3,Jlikelihood(.|θ3), ..., yn,1,yn,2,...,yn,j,...,yn,Jlikelihood(.|θn), ..., yN,1,yN,2,...,yN,j,...,yN,Jlikelihood(.|θN).

Now, my body is not so good, so, I am tired. Cuz I counld not understand what I wrote, so I reviesed in 2020 Aug 9.

You health is very bad condition, so, if the sentence is not clear, it is also for me! even if I wrote it! So, If I notice that my past brain is broken, then I will revise. Ha,,, I want be rest in peace.

replicate.number.from.model.for.each.MCMC.sample

A positive integer, representing J in the following notation.

Value

A list, including p value and materials to calculate it.

Contents of the list as a return values is the following:

FPF,TPF,..etc

data yn,jlikelihood(.|θn),

chisq_at_observed_data

χ(D|θ1),χ(D|θ2),χ(D|θ3),...,χ(D|θn),

chisq_not_at_observed_data

χ(y1|θ1),χ(y2|θ2),χ(y3|θ3),...,χ(yn|θn),

Logical

The i-th component is a logical vector indicating whether χ(y2|θ2)>χ(D|θ2) is satisfied or not. Oppai ga Ippai. If TRUE, then the inequality holds.

p.value

From the component Logical, we calculate the so-called Posterior Predictive P value. Note that the author hate this notion!! I hate it!! Akkan Beeeee!!!

Details

In addition, this function plots replicated datasets from model at each MCMC sample generated by HMC. Using the Hamiltonian Monte Carlo Sampling: HMC. we can draw the MCMC samples of size n, say θ1,θ2,θ3,...,θn, namely, θ1π(.|D), θ2π(.|D), θ3π(.|D), ..., θnπ(.|D). where π(θ|D) is the posterior for given data D.

We draw samples as follows.

y1,1,y1,2,...,y1,j,...,y1,Jlikelihood(.|θ1), y2,1,y2,2,...,y2,j,...,y2,Jlikelihood(.|θ2), y3,1,y3,2,...,y3,j,...,y3,Jlikelihood(.|θ3), ..., yn,1,yn,2,...,yn,j,...,yn,Jlikelihood(.|θn), ..., yN,1,yN,2,...,yN,j,...,yN,Jlikelihood(.|θN).

Then we calculates the chi-squares for each sample.

χ(y1,1|θ1),χ(y1,2|θ1),χ(y1,3|θ1),...,χ(y1,j|θ1),....,χ(y1,J|θ1), χ(y2,1|θ2),χ(y2,2|θ2),χ(y2,3|θ2),...,χ(y2,j|θ2),....,χ(y2,J|θ2), χ(y3,1|θ3),χ(y3,2|θ3),χ(y3,3|θ3),...,χ(y3,j|θ3),....,χ(y3,J|θ3), ..., χ(yi,1|θi),χ(yi,2|θi),χ(yi,3|θi),...,χ(yi,j|θi),....,χ(yI,J|θi), ..., χ(yI,1|θI),χ(yI,2|θI),χ(yI,3|θI),...,χ(yI,j|θI),....,χ(yI,J|θI).

where L(.|θi) is a likelihood at parameter θi.

Let χ(y|θ) be a chi square goodness of fit statistics of our hierarchical Bayesian Model

χ(y|θ):=r=1Rm=1Mc=1C((Hc,m,rNL×pc,m,r)2NL×pc,m,r+(Fc,m,r(λcλc+1)×NL)2(λcλc+1)×NL).

and a chi square goodness of fit statistics of our non-hierarchical Bayesian Model

χ(y|θ):=c=1C((HcNL×pc)2NL×pc+(Fc(λcλc+1))×NL]2(λcλc+1)×NL).

where a dataset y denotes (Fc,m,r,Hc,m,r) in MRMC case and (Fc,Hc) in a single reader and a single modality case, and model parameter θ.

Then we can calculate the posterior predictive p value for a given dataset y0.

I(χ(y|θ)>χ(y0|θ))f(y|θ)π(θ|y0)dθdy iI(χ(y|θi)>χ(y0|θi))f(y|θi)dy j=1Ji=1II(χ(yi,j|θi)>χ(y0|θi))

When we plot these synthesized data-sets yi,j, we use the jitter() which adds a small amount of noise to avoid overlapping points. For example, jitter(c(1,1,1,1)) returns values: 1.0161940 1.0175678 0.9862400 0.9986126, which is changed from 1,1,1,1 to be not exactly 1 by adding tiny errors to avoid overlapping. I love you. 2019 August 19 Nowadays, I cannot remove my self from some notion, such as honesty, or pain, or,.. maybe these thing is no longer with myself. This programm is made to fix previous release calculation. Now, this programm calculates correct p value.

So... I calculate the ppp for MCMC and Graphical User Interface based on Shiny for MRMC, which should be variable such as number of readers, modalities, to generate such ID vectors automatically. Ha,... tired! Boaring, I want to die...t, diet!! Tinko, tinko unko unko. Manko manko. ha.

Leberiya, he will be die, ha... he cannot overcome, very old, old guy. I will get back to meet him. Or I cannot meet him? Liberiya,...very wisdom guy, Ary you already die? I will get back with presents for you. Ball, I have to throgh ball, and he will catch it.

The reason why the author made the plot of data drawn from Posterior Predictive likelihoods with each MCMC parameters is to understand our programm is correct, that is, each drawing is very mixed. Ha,.... when wright this,... I always think who read it. I love you, Ruikobach. Ruikobach is tiny and tiny, but,... cute. Ruikosan...Ruiko... But he has time only several years. He will die, he lives sufficiently so long, ha.

Using this function, user would get reliable posterior predictive p values, Cheers! Pretty Crowd!

We note that the calculation of posterior perdictive p value (PPP) relies on the law of large number. Thus, in order to obtain the relicable PPP, we need to enough large MCMC samples to approximate the double integral of PPP. For example, the MCMC samples is small, then R hat is far from 1 but, the low MCMC samples leads us to incorrect p value which sometimes said that the model is correct even if the R hat criteria reject the MCMC results.

Examples

Run this code
# NOT RUN {

# }
# NOT RUN {
#========================================================================================
#            1) Create a fitted model object with data named  "d"
#========================================================================================



fit <- fit_Bayesian_FROC( dataList = d,
                              ite  = 222 # to restrict running time, but it is too small
                           )



#========================================================================================
#            2) Calculate p value and meta data
#========================================================================================



            ppp <- ppp_srsc(fit)



#========================================================================================
#            3) Extract a p value
#========================================================================================




              ppp$p.value


# Revised 2019 August 19
# Revised 2019 Nov 27

     Close_all_graphic_devices() # 2020 August
# }
# NOT RUN {


# }

Run the code above in your browser using DataLab