Learn R Programming

BayesianFROC (version 0.2.1)

chi_square_goodness_of_fit: Chi square goodness of fit statistics at each MCMC sample w.r.t. a given dataset.

Description

Calculates a vector of the Goodness of Fit (Chi Square) for a given dataset \(D\) and each MCMC sample $$ \chi^2 (D|\theta_i), i=1,2,3,....$$

Usage

chi_square_goodness_of_fit(StanS4class, dig = 3,
  h = StanS4class@dataList$h, f = StanS4class@dataList$f)

Arguments

StanS4class

An S4 object of class stanfitExtended which is an inherited class from the S4 class stanfit. This R object can be passed to the DrawCurves(), ppp() and ... etc

dig

To be passed to the function rstan::sampling() in rstan. An argument of rstan::sampling() in which it is named ...??. A positive integer representing the Significant digits, used in stan Cancellation. default = 5,

h

A vector of positive integers, representing the number of hits. This variable was made in order to substitute the hits data drawn from the posterior predictive distributions. In famous Gelman's book, he explain how to use the test statistics in the Bayesian context. In this context I need to substitute the replication data from the posterior predictive distributions.

f

A vector of positive integers, representing the number of false alarms. This variable was made in order to substitute the false alarms data drawn from the posterior predictive distributions. In famous Gelman's book, he explain how to use the test statistics in the Bayesian context. In this context I need to substitute the replication data from the posterior predictive distributions.

Value

Chi squares for each MCMC sample. $$\chi^2 = \chi^2 (D|\theta_i),i=1,2,...,N$$ So, the return values is a vector of length \(N\) which denotes the number of MCMC iterations except the warming up period. Of course if MCMC is not only one chain, then all samples of chains are used to calculate the chi square.

In the sequel, we use the notation

for a prior \(\pi(\theta)\),

posterior \(\pi(\theta|D)\),

likelihood \(f(D|\theta)\),

parameter \(\theta\),

datasets \(D\) as follows;

$$ \pi(\theta|D) \propto f(D|\theta) \pi(\theta).$$

Let us denote the posterior MCMC samples of size \(N\) by

$$\theta_1, \theta_2, \theta_3,...,\theta_N$$

which is drawn from posterior \(\pi(\theta|D)\) of given data \(D\).

Recall that the chi square goodness of fit statistics \(\chi\) depends on the model parameter \(\theta\) and data \(D\), namely,

$$\chi^2 = \chi^2 (D|\theta)$$.

Then return value is a vector of length \(N\) whose components is given by:

$$\chi^2 (D|\theta_1), \chi^2 (D|\theta_2), \chi^2 (D|\theta_3),...,\chi^2 (D|\theta_N),$$

which is a vector and a return value of this function.

As an application of this return value, we can calculate the posterior mean of \(\chi = \chi (D|\theta)\), namely, we get

$$ \chi^2 (D) =\int \chi^2 (D|\theta) \pi(\theta|D) d\theta.$$

In my model, almost all example, result of calculation shows that

$$ \int \chi^2 (D|\theta) \pi(\theta|D) d\theta > \chi^2 (D| \int \theta \pi(\theta|D) d\theta) $$

The above inequality is true for all \(D\)?? I conjecture it.

Revised 2019 August 18 Revised 2019 Sept. 1 Revised 2019 Nov 28

Our data is 2C categories, that is,

the number of hits :h[1], h[2], h[3],...,h[C] and

the number of false alarms: f[1],f[2], f[3],...,f[C].

Our model has C+2 parameters, that is,

the thresholds of the bi normal assumption z[1],z[2],z[3],...,z[C] and

the mean and standard deviation of the signal distribution.

So, the degree of freedom of this statistics is calculated by

No. of categories - No. of parameters - 1 = 2C-(C+2)-1 =C -3.

This differ from Chakraborty's result C-2. Why ?

Details

To calculate the chi square \(\chi^2 (y|\theta)\) test statistics, the two variables are required; one is an observed dataset \(y\) and the other is an estimated parameter \(\theta\). In the classical chi square values, MLE(maximal likelihood estimator) is used for an estimated parameter \(\theta\) in \(\chi^2 (y|\theta)\). However, in the Bayesian context, the parameter is not deterministic and we consider it is a random variable such as samples from the posterior distribution. And such samples are obtained in the Hamiltonian Monte Carlo Simulation. Thus we can calculate chi square values for each MCMC sample.

Examples

Run this code
# NOT RUN {
# }
# NOT RUN {
#  Get the MCMC samples from a dataset.

       fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
                           ite = 1111,
                           summary =FALSE,
                           cha = 2)

#   The chi square discrepancies are calculated by the following code

         Chi.Square.for.each.MCMC.samples   <-   chi_square_goodness_of_fit(fit)



#'


         # With Warning
         chi_square_goodness_of_fit(fit)

         # Without warning
          chi_square_goodness_of_fit(fit,
                                     h=fit@dataList$h,
                                     f=fit@dataList$f)







#  Get posterior mean of the chi square discrepancy.

                    m<-   mean(Chi.Square.for.each.MCMC.samples)




# The author read at 2019 Sept. 1, it helps him. Thanks me!!




# Calculate the p-value for the posterior mean of the chi square discrepancy.

                     stats::pchisq(m,df=1)


# Difference between chi sq. at EAP and EAP of chi sq.

   mean( fit@chisquare - chi_square_goodness_of_fit(fit))


# }
# NOT RUN {
# dottest

# }

Run the code above in your browser using DataLab