rvgt.ftable: Create RVG Frequency Table for Random Variate Generator

Description

Function for creating frequency tables for random variate generators. Thus a histogram is computed and the bin counts are stored in an array which can be used to visualize possible defects of the pseudo-random variate generator and run goodness-of-fit tests.

The function only works for generators for univariate distributions.

Usage

rvgt.ftable(n, rep=1, rdist, qdist, pdist, ...,
            breaks = 101, trunc=NULL, exactu=FALSE, plot=FALSE)

Arguments

sample size for one repetition ($\ge 100$).

rep

number of repetitions.

rdist

random variate generator for a univariate distribution.

qdist

quantile function for the distribution.

pdist

cumulative distribution function for distribution.

...

parameters to be passed to rdist, qdist and pdist.

breaks

one of:

a single number giving the number of cells of histogram; or
a vector giving the breakpoints between histogram cells (in uniform scale). Notice that in the latter case the break points are automatically sorted and th

trunc

boundaries of truncated domain. (optional)

exactu

logical. If TRUE then the exact locations of the given break points are used. Otherwise, these points are slightly shifted in order to accelerate exection time, see details below.

plot

logical. If TRUE, a histogram is plotted.

Value

An object of class "rvgt.ftable" which is a list with components:
nsample size.
repnumber of repetitions.
ubreaksan array of break points in $u$-scale.
xbreaksan array of break points in $x$-scale.
counta matrix of rep rows and (breaks$-1$) columns that contains the bin counts. The results for each repetition are stored row wise.
dtypea string that contains the type of the distribution: "cont" or "discr".

Details

rvgt.ftable returns tables of bin counts similar to the hist function. Bins can be specified either by the number of break points between the cells of the histogram, or by a list of break points in the $u$-scale. In the former case the break points are constructed such that all bins of the histogram have equal probability for the distribution under the null hypothesis, i.e., the break points are equidistributed in the $u$-scale using the formula $u_i=i/(breaks-1)$ where $i=0,\dots,breaks-1$.

When the quantile function qdist is given, then these points are transformed into breaking points in the $x$-scale using qdist$(u_i)$. Thus the histogram can be computed directly for random points $X$ that are generated by means of rdist.

Otherwise the cumulative distribution function pdist must be given. If exactu is TRUE, then all non-uniform random points $X$ are first transformed into uniformly distributed random numbers $U=$pdist$(X)$ for which the histogram is created. This is slower than directly using $X$ but it is numerically more robust as round-off error in qdist have much more influence than those in pdist.

If trunc is given, then functions qdist and pdist are rescaled to this given domain. It is recommended to provide pdist even when qdist is given. If exactu is FALSE and the quantile function qdist is missing, then the first sample of size n is used to estimate the quantiles for the given break points using function quantile. The break points in $u$-scale are then recomputed using these quantiles by means of the given probability function pdist. This is usually (much) faster than calling pdist on each generated point. However, the break points are slightly perturbated (but this does not effect the correctness of the frequency table). The argument rep allows to create multiple such arrays of bin counts and store these in one table. Thus has two advantages:

It allows for huge total sample sizes that would otherwise exceed the available memory, and
it can be used to visualize test results for increasing sample sizes, or
allows for a two-level test.

For discrete distributions function pdist must be given and both arguments qdist and exactu are ignored. Moreover, the given break points have to be adjusted according to the probability function of the discrete distribution. In particular this means that bins have to be collapsed when the probability of some number is larger than difference of break points in $u$-scale. Thus there resulting tables may contain less break points than requested.

The type of distribution (continuous or discrete) is autodetected by the function.

References

W. H"ormann, J. Leydold, and G. Derflinger (2004): Automatic Nonuniform Random Variate Generation. Springer-Verlag, Berlin Heidelberg

Examples

Run this code

## Create a frequency table for normal distribution with mean 1 and
## standard deviation 2. Number of bins should be 50.
## Use a sample of size of 5 times 10^5 random variates.
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, breaks=51, mean=1,sd=2)

## Show histogram
plot(ft)

## Run a chi-square test
rvgt.chisq(ft)

## The following allows to plot a histgram in a single call.
rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, plot=TRUE)

## Use the cumulative distribution function when the quantile function
## is not available or if its round-off errors have serious impact.
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,pdist=pnorm )
plot(ft)

## Create a frequency table for the normal distribution with
## non-equidistributed break points
ft <- rvgt.ftable(n=1e5,rep=5, rdist=rnorm,qdist=qnorm, breaks=1/(1:100))
plot(ft)

## A (naive) generator for a truncated normal distribution
rdist <- function(n) {
  x <- numeric(n)
  for (i in 1:n){ while(TRUE){ x[i] <- rnorm(1); if (x[i]>1) break} }
  return(x)
}
ft <- rvgt.ftable(n=1e3,rep=5, rdist=rdist,
                  pdist=pnorm, qdist=qnorm, trunc=c(1,Inf))
plot(ft)

## An example for a discrete distribution
ft <- rvgt.ftable(n=1e5,rep=1, rdist=rgeom,pdist=pgeom, prob=0.123)
plot(ft)

Run the code above in your browser using DataLab