G_moments: 1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes

Description

Calculate the a priori expected number of clusters (G_expected) or the variance of the number of clusters (G_variance) under a PYP or DP prior for a sample of size N at given values of the concentration parameter alpha and optionally also the Pitman-Yor discount parameter. Useful for soliciting sensible priors (or fixed values) for alpha or discount under the "IMFA" and "IMIFA" methods for mcmc_IMIFA. Additionally, for a given sample size N and given expected number of clusters EG, G_calibrate elicits a value for the concentration parameter alpha or the discount parameter.

Usage

G_expected(N,
           alpha,
           discount = 0,
           MPFR = TRUE)
G_variance(N,
           alpha,
           discount = 0,
           MPFR = TRUE)
G_calibrate(N,
            EG,
            alpha = NULL,
            discount = 0,
            MPFR = TRUE,
            ...)

Value

The expected number of clusters under the specified prior conditions (G_expected), or the variance of the number of clusters (G_variance), or the concentration parameter alpha

discount parameter achieving a particular expected number of clusters (G_calibrate).

Arguments

N: The sample size.
alpha: The concentration parameter. Must be specified (though not for G_calibrate) and must be strictly greater than -discount. The case alpha=0 is accommodated. When discount is negative alpha must be a positive integer multiple of abs(discount). See Details for behaviour for G_calibrate.
discount: The discount parameter for the Pitman-Yor process. Must be less than 1, but typically lies in the interval [0, 1). Defaults to 0 (i.e. the Dirichlet process). When discount is negative alpha must be a positive integer multiple of abs(discount). See Details for behaviour for G_calibrate.
MPFR: Logical indicating whether the high-precision libraries Rmpfr and gmp are invoked, at the expense of run-time. Defaults to TRUE and must be TRUE for G_expected when alpha=0 or G_variance when discount is non-zero. For G_calibrate, it is strongly recommended to use MPFR=TRUE when discount is non-zero and strictly necessary when alpha=0 is supplied. See Note.
EG: The prior expected number of clusters. Must exceed 1 and be less than N.
...: Additional arguments passed to uniroot, e.g. maxiter.

Author

Keefe Murphy - <keefe.murphy@mu.ie>

Details

All arguments are vectorised. Users can also consult G_priorDensity in order to solicit sensible priors.

For G_calibrate, only one of alpha or discount can be supplied, and the function elicits a value for the opposing parameter which achieves the desired expected number of clusters EG for the given sample size N. By default, a value for alpha subject to discount=0 (i.e. the Dirichlet process) is elicited. Note that alpha may not be a positive integer multiple of discount as it should be if discount is negative. See Examples below.

References

De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prunster, I., and Ruggiero, M. (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.

Yamato, H. and Shibuya, M. (2000) Moments of some statistics of Pitman sampling formula, Bulletin of Informatics and Cybernetics, 32(1): 1-10.

Examples

Run this code

# Certain examples require the use of the Rmpfr library
suppressMessages(require("Rmpfr"))

G_expected(N=50, alpha=19.23356, MPFR=FALSE)
G_variance(N=50, alpha=19.23356, MPFR=FALSE)

G_expected(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=FALSE)
G_variance(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=c(FALSE, TRUE, TRUE))

# Examine the growth rate of the DP
DP   <- sapply(c(1, 5, 10), function(i) G_expected(1:200, alpha=i, MPFR=FALSE))
matplot(DP, type="l", xlab="N", ylab="G")

# Examine the growth rate of the PYP
PY <- sapply(c(0.25, 0.5, 0.75), function(i) G_expected(1:200, alpha=1, discount=i))
matplot(PY, type="l", xlab="N", ylab="G")

# Other special cases of the PYP are also facilitated
G_expected(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))
G_variance(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))

# Elicit values for alpha under a DP prior
G_calibrate(N=50, EG=25)

# Elicit values for alpha under a PYP prior
# G_calibrate(N=50, EG=25, discount=c(-27.1401/100, 0.25, 0.7300045))
# Elicit values for discount under a PYP prior
# G_calibrate(N=50, EG=25, alpha=c(12.21619, 1, 0), maxiter=2000)

Run the code above in your browser using DataLab