rvgtest-package: Tools for Analyzing Non-Uniform Pseudo-Random Variate Generators (RVG)

Description

Suite for testing non-uniform random number generators.

Arguments

1. Histograms

A frequently used method for testing continuous univariate distributions is based on the following strategy: Draw a sample, compute a histogram and run a goodness-of-fit test on the resulting frequency table.

We have implemented a three step procedure:

Create tables that can hold the information of huge random samples.
Perform some test for the null hypothesis on these tables.
Visualize these tables as well the results of the tests.

The advantages of this procedure are:

Huge total sample sizes are possible (only limited by available runtime but not by memory).
Can run multiple tests on the same random sample.
Inspect data visually.

In addition there are also some random functions for introducing defects in other random variate generators artificially. Thus one may investigate the power of tests.

List of Routines{ Data generation: rvgt.ftable. Tests: rvgt.chisq, rvgt.Mtest. Visualization: plot (see plot.rvgt.ftable, plot.rvgt.htest for the respective syntax of the call). Perturbation of RVGs: pertadd, pertsub. }

Details

ll{ Package: rvgtest Type: Package Version: 0.5.0 Date: 2010-07-12 License: GPL 2 or later LazyLoad: yes }

rvgtest is a set of tools to investigate the quality of non-uniform pseudo-random random generators (RVG). Thus it provides functions to visualize and test for possible defects. There are three mean reasons for such defects and errors:

Errors in the design of algorithms -- The proof for theorem that claims the correctness of the algorithm is wrong.
Implementation errors -- Mistakes in computer programs.
Limitations of floating point arithmetic and round-off errors in implementations of these algorithms in real world computers.

Of course testing software is a self-evident part of software engineering. Implementation errors usually result in large deviations from the requested distribution and thus errors of type 2 are easily detected. However, this need not always be the case, for example for rather complicated algorithms like those based on patchwork methods.

The same holds for errors of type 1. In the best of all worlds, there exists a correct proof for the validity of the algorithm. In our world however human can err. Then the deviations are rather small, since otherwise it would have been detected when testing the implementation for errors of type 2.

Errors of type 3 can be a problem when the requested distribution has extreme properties. E.g., it is no problem to generate a sample of beta distributed random variates with shape parameters 0.001 using rbeta(n=100, shape1=0.001, shape2=0.001). However, due the limited resolution of floating point numbers it behaves like a discrete distribution (especially near 1). It is not always obvious whether such round-off errors will influence ones simulation results.

It is the purpose of this package to provide some tools to find possible errors in RVGs. However, observing a defect in (the implementation of) a pseudo-random variate generator by purely statistical tools may require a large sample size which that exceeds the memory when hold in a single array in R. (Nevertheless, there is some chance that this defect causes an error in a particular simulation with a moderate sample size.) Hence we have implemented routines that can run tests on very large sample sizes (which are only limited by the available runtimes).

Currently there are two toolsets for testing random variate generators for continuous univariate distributions:

Testing based on histograms for all kinds of RVGs.
Estimating errors of RVGs that are based on numerical inversion methods.

Examples

Run this code

## ------------------------------------------------
## 1. Histogram
## ------------------------------------------------

## Use a poor Gaussian random variate generator
## (otherwise we should not see a defect).
RNGkind(normal.kind="Buggy Kinderman-Ramage")

## Create table of bin counts.
## Use a sample of 20 times 10^5 random variates.
table <- rvgt.ftable(n=1e5, rep=20, rdist=rnorm, qdist=qnorm)

## Plot histogram for (cumulated) data
plot(table)

## Perform a chi-square goodness-of-fit test and plot result
r1 <- rvgt.chisq(table)
plot(r1)

## Perform M-test and plot both results
r2 <- rvgt.Mtest(table)
plot.rvgt.htest(list(r1,r2))

## ------------------------------------------------
## 2. Numerical Inversion
## ------------------------------------------------

## Create a table of u-errors for spline interpolation of
## the inverse CDF of the standard normal distribution.
aqn <- splinefun(x=pnorm((-100:100)*0.05), y=(-100:100)*0.05,
                 method="monoH.FC")
## Use a sample of size of 10^5 random variates.
uerrn <- uerror(n=1e5, aqdist=aqn, pdist=pnorm)

## Plot u-errors
plot(uerrn)

Run the code above in your browser using DataLab