dqfr: Probability distribution of ratio of quadratic forms

Description

dqfr(): Density of the (power of) ratio of quadratic forms, $\left( \frac{ \mathbf{x^{\mathit{T}} A x} }{ \mathbf{x^{\mathit{T}} B x} } \right) ^ p $, where $\mathbf{x} \sim N_n(\bm{\mu}, \mathbf{\Sigma})$.

pqfr(): Distribution function of the same.

qqfr(): Quantile function of the same.

dqfr_A1I1(): internal for dqfr(), exact series expression of Hillier (2001). Only accommodates the simple case where $\mathbf{B} = \mathbf{I}_n$ and $\bm{\mu} = \mathbf{0}_n$.

dqfr_broda(): internal for dqfr(), exact numerical inversion algorithm of Broda & Paolella (2009).

dqfr_butler(): internal for dqfr(), saddlepoint approximation of Butler & Paolella (2007, 2008).

pqfr_A1B1(): internal for pqfr(), exact series expression of Forchini (2002, 2005).

pqfr_imhof(): internal for pqfr(), exact numerical inversion algorithm of Imhof (1961).

pqfr_davies(): internal for pqfr(), exact numerical inversion algorithm of Davies (1973, 1980). This is experimental and may be removed in the future.

pqfr_butler(): internal for pqfr(), saddlepoint approximation of Butler & Paolella (2007, 2008).

The user is supposed to use the exported functions dqfr(), pqfr(), and qqfr(), which are (pseudo-)vectorized with respect to quantile or probability. The actual calculations are done by one of the internal functions, which only accommodate a length-one quantile. The internal functions skip most checks on argument structures and do not accommodate Sigma to reduce execution time.

Usage

dqfr(
  quantile,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  log = FALSE,
  method = c("broda", "hillier", "butler"),
  trim_values = TRUE,
  normalize_spa = FALSE,
  return_abserr_attr = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  ...
)
pqfr(
  quantile,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  lower.tail = TRUE,
  log.p = FALSE,
  method = c("imhof", "davies", "forchini", "butler"),
  trim_values = TRUE,
  return_abserr_attr = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  ...
)
qqfr(
  probability,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  lower.tail = TRUE,
  log.p = FALSE,
  trim_values = FALSE,
  return_abserr_attr = FALSE,
  stop_on_error = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  epsabs_q = .Machine$double.eps^(1/2),
  maxiter_q = 5000,
  ...
)
dqfr_A1I1(
  quantile,
  LA,
  m = 100L,
  check_convergence = c("relative", "strict_relative", "absolute", "none"),
  use_cpp = TRUE,
  tol_conv = .Machine$double.eps^(1/4),
  thr_margin = 100
)
dqfr_broda(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = TRUE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = epsrel,
  epsrel = 1e-06,
  limit = 10000
)
dqfr_butler(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  order_spa = 2,
  stop_on_error = FALSE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = .Machine$double.eps^(1/2),
  epsrel = 0,
  maxiter = 5000
)
pqfr_A1B1(
  quantile,
  A,
  B,
  m = 100L,
  mu = rep.int(0, n),
  check_convergence = c("relative", "strict_relative", "absolute", "none"),
  stop_on_error = FALSE,
  use_cpp = TRUE,
  cpp_method = c("double", "long_double", "coef_wise"),
  nthreads = 1,
  tol_conv = .Machine$double.eps^(1/4),
  tol_zero = .Machine$double.eps * 100,
  thr_margin = 100
)
pqfr_imhof(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = TRUE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = epsrel,
  epsrel = 1e-06,
  limit = 10000
)
pqfr_davies(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = NULL,
  tol_zero = .Machine$double.eps * 100,
  ...
)
pqfr_butler(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  order_spa = 2,
  stop_on_error = FALSE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = .Machine$double.eps^(1/2),
  epsrel = 0,
  maxiter = 5000
)

Value

dqfr() and pqfr() give the density and distribution (or $p$-values) functions, respectively, corresponding to quantile, whereas qqfr() gives the quantile function corresponding to probability.

When return_abserr_attr = TRUE, an absolute error bound of numerical evaluation is returned as an attribute; this feature is currently available with dqfr(..., method = "broda"), pqfr(..., method = "imhof"), and qqfr(..., method = "imhof")

(all default) only. This error bound is automatically transformed when trimming happens with trim_values (above) or when log/log.p = TRUE. See vignette for details (vignette("qfratio_distr")).

The internal functions return a list containing $d or $p

(for density and lower $p$-value, respectively), and only this is passed to the external function by default. Other components may be inspected for debugging purposes:

dqfr_A1I1() and pqfr_A1B1(): have $terms, a vector of $0$th to $m$th order terms.
pqfr_imhof() and dqfr_broda(): have $abserr, absolute error of numerical integration; the one returned from CompQuadForm::imhof() is divided by pi, as the integration result itself is (internally). This is passed to the external functions when return_abserr_attr = TRUE (above).
pqfr_davies(): has the same components as CompQuadForm::davies() apart from Qq which is replaced by p = 1 - Qq.

Arguments

quantile

Numeric vector of quantiles $q$

A, B

Argument matrices. Should be square. B should be nonnegative definite. Will be automatically symmetrized in dqfr() and pqfr().

p

Positive exponent of the ratio, default 1. Unlike in qfrm(), the numerator and denominator cannot have different exponents. When p is non-integer, A must be nonnegative definite. For details, see vignette vignette("qfratio_distr").

mu

Mean vector $\bm{\mu}$ for $\mathbf{x}$

Sigma

Covariance matrix $\mathbf{\Sigma}$ for $\mathbf{x}$

log, lower.tail, log.p

Logical; as in regular probability distribution functions. But these are for convenience only, and not meant for accuracy.

method

Method to specify an internal function (see “Details”). In dqfr(), options are:

"broda": default; uses dqfr_broda(), numerical inversion of Broda & Paolella (2009)

"hillier"

uses dqfr_A1I1(), series expression of Hillier (2001)

"butler"

uses dqfr_butler(), saddlepoint approximation of Butler & Paolella (2007, 2008)

In pqfr(), options are:

"imhof": default; uses pqfr_imhof(), numerical inversion of Imhof (1961)
"davies": uses pqfr_davies(), numerical inversion of Davies (1973, 1980)
"forchini": uses pqfr_A1B1(), series expression of Forchini (2002, 2005)
"butler": uses pqfr_butler(), saddlepoint approximation of Butler & Paolella (2007, 2008)

trim_values

If TRUE (default), numerical values outside the mathematically permissible support are trimmed in (see “Details”)

normalize_spa

If TRUE and method == "butler", result is normalized so that the density integrates to unity (see “Details”)

return_abserr_attr

If TRUE, absolute error of numerical evaluation is returned as an attribute "abserr" (see “Value”)

Order of polynomials at which the series expression is truncated. $M$ in Hillier et al. (2009, 2014).

tol_zero

Tolerance against which numerical zero is determined. Used to determine, e.g., whether mu is a zero vector, A or B equals the identity matrix, etc.

tol_sing

Tolerance against which matrix singularity and rank are determined. The eigenvalues smaller than this are considered zero.

...

Additional arguments passed to internal functions. In qqfr(), these are passed to pqfr().

probability

Numeric vector of probabilities

stop_on_error

If TRUE, execution is stopped upon an error (including non-convergence) in evaluation of hypergeometric function, numerical integration, or root finding. If FALSE, further execution is attempted regardless.

Eigenvalues of $\mathbf{A}$

check_convergence

Specifies how numerical convergence is checked for series expression (see qfrm)

use_cpp

Logical to specify whether the calculation is done with C++ functions via Rcpp. TRUE by default.

tol_conv

Tolerance against which numerical convergence of series is checked. Used with check_convergence.

thr_margin

Optional argument to adjust the threshold for scaling (see “Scaling” in d1_i). Passed to internal functions (d1_i, d2_ij, d3_ijk) or their C++ equivalents.

autoscale_args

Numeric; if > 0 (default), arguments are scaled to avoid failure in numerical integration (see vignette("qfratio_distr")). If <= 0, the scaling is skipped.

epsabs, epsrel, limit, maxiter, epsabs_q, maxiter_q

Optional arguments used in numerical integration or root-finding algorithm (see vignette: vignette("qfratio_distr")). In qqfr(), epsabs_q and maxiter_q are used in root-finding for quantiles whereas epsabs and maxiter are passed to pqfr() internally.

order_spa

Numeric to determine order of saddlepoint approximation. More accurate second-order approximation is used for any order > 1 (default); otherwise, (very slightly) faster first-order approximation is used.

cpp_method

Method used in C++ calculations to avoid numerical overflow/underflow (see “Details” in qfrm)

nthreads

Number of threads used in OpenMP-enabled C++ functions (see “Multithreading” in qfrm)

Details

qqfr() is based on numerical root-finding with pqfr() using uniroot(), so its result can be affected by the numerical errors in both the algorithm used in pqfr() and root-finding.

dqfr_A1I1() and pqfr_A1B1() evaluate the probability density and (cumulative) distribution function, respectively, as a partial sum of infinite series involving top-order zonal or invariant polynomials (Hillier 2001; Forchini 2002, 2005). As in other functions of this package, these are evaluated with the recursive algorithm d1_i.

pqfr_imhof() and pqfr_davies() evaluate the distribution function by numerical inversion of the characteristic function based on Imhof (1961) or Davies (1973, 1980), respectively. The latter calls davies(), and the former with use_cpp = FALSE calls imhof(), from the package CompQuadForm. Additional arguments for davies() can be passed via ..., except for sigma, which is not applicable.

dqfr_broda() evaluates the probability density by numerical inversion of the characteristic function using Geary's formula based on Broda & Paolella (2009). Parameters for numerical integration can be controlled via the arguments epsabs, epsrel, and limit (see vignette: vignette("qfratio_distr")).

dqfr_butler() and pqfr_butler() evaluate saddlepoint approximations of the density and distribution function, respectively, based on Butler & Paolella (2007, 2008). These are fast but not exact. They conduct numerical root-finding for the saddlepoint by the Brent method, parameters for which can be controlled by the arguments epsabs, epsrel, and maxiter (see vignette: vignette("qfratio_distr")). The saddlepoint approximation density does not integrate to unity, but can be normalized by dqfr(..., method = "butler", normalize_spa = TRUE). Note that this is usually slower than dqfr(..., method = "broda") for a small number of quantiles.

The density is undefined, and the distribution function has points of nonanalyticity, at the eigenvalues of $\mathbf{B}^{-1} \mathbf{A}$ (assuming nonsingular $\mathbf{B}$). Around these points, the series expressions tends to fail. Avoid using the series expression methods for these cases.

Algorithms based on numerical integration can yield spurious results that are outside the mathematically permissible support; e.g., $[0, 1]$ for pqfr(). By default, dqfr() and pqfr() trim those values into the permissible range with a warning; e.g., negative p-values are replaced by ~2.2e-14 (default tol_zero). Turn trim_values = FALSE to skip these trimming and warning, although pqfr_imhof() and pqfr_davies() can still throw a warning from CompQuadForm functions. Note that, on the other hand, all these functions try to return exact 0 or 1 when $q$ is outside the possible range of the statistic.

References

Broda, S. and Paolella, M. S. (2009) Evaluating the density of ratios of noncentral quadratic forms in normal variables. Computational Statistics and Data Analysis, 53, 1264--1270. tools:::Rd_expr_doi("10.1016/j.csda.2008.10.035")

Butler, R. W. and Paolella, M. S. (2007) Uniform saddlepoint approximations for ratios of quadratic forms. Technical Reports, Department of Statistical Science, Southern Methodist University, no. 351. [Available on arXiv as a preprint.] tools:::Rd_expr_doi("10.48550/arXiv.0803.2132")

Butler, R. W. and Paolella, M. S. (2008) Uniform saddlepoint approximations for ratios of quadratic forms. Bernoulli, 14, 140--154. tools:::Rd_expr_doi("10.3150/07-BEJ6169")

Davis, R. B. (1973) Numerical inversion of a characteristic function. Biometrika, 60, 415--417. tools:::Rd_expr_doi("10.1093/biomet/60.2.415").

Davis, R. B. (1980) Algorithm AS 155: The distribution of a linear combination of $\chi^2$ random variables. Journal of the Royal Statistical Society, Series C---Applied Statistics, 29, 323--333. tools:::Rd_expr_doi("10.2307/2346911").

Forchini, G. (2002) The exact cumulative distribution function of a ratio of quadratic forms in normal variables, with application to the AR(1) model. Econometric Theory, 18, 823--852. tools:::Rd_expr_doi("10.1017/S0266466602184015").

Forchini, G. (2005) The distribution of a ratio of quadratic forms in noncentral normal variables. Communications in Statistics---Theory and Methods, 34, 999--1008. tools:::Rd_expr_doi("10.1081/STA-200056855").

Hillier, G. (2001) The density of quadratic form in a vector uniformly distributed on the $n$-sphere. Econometric Theory, 17, 1--28. tools:::Rd_expr_doi("10.1017/S026646660117101X").

Imhof, J. P. (1961) Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419--426. tools:::Rd_expr_doi("10.1093/biomet/48.3-4.419").

Examples

Run this code

## Some symmetric matrices and parameters
nv <- 4
A <- diag(nv:1)
B <- diag(sqrt(1:nv))
mu <- 0.2 * nv:1
Sigma <- matrix(0.5, nv, nv)
diag(Sigma) <- 1

## density and p-value for (x^T A x) / (x^T x) where x ~ N(0, I)
dqfr(1.5, A)
pqfr(1.5, A)

## 95 percentile for the same
qqfr(0.95, A)
qqfr(0.05, A, lower.tail = FALSE) # same

## P{ (x^T A x) / (x^T B x) <= 1.5} where x ~ N(mu, Sigma)
pqfr(1.5, A, B, mu = mu, Sigma = Sigma)

## These are (pseudo-)vectorized
qs <- 0:nv + 0.5
dqfr(qs, A, B, mu = mu)
(pres <- pqfr(qs, A, B, mu = mu))

## Quantiles for above p-values
## Results equal qs, except that those for prob = 0 and 1
## are replaced by mininum and maximum of the ratio
qqfr(pres, A, B, mu = mu) # = qs

## Various methods for density
dqfr(qs, A, method = "broda")   # default
dqfr(qs, A, method = "hillier") # series; B, mu, Sigma not permitted
## Saddlepoint approximations (fast but inexact):
dqfr(qs, A, method = "butler")  # 2nd order by default
dqfr(qs, A, method = "butler", normalize_spa = TRUE) # normalized
dqfr(qs, A, method = "butler", normalize_spa = TRUE,
     order_spa = 1) # 1st order, normalized

## Various methods for distribution function
pqfr(qs, A, method = "imhof")    # default
pqfr(qs, A, method = "davies")   # very similar
pqfr(qs, A, method = "forchini") # series expression
pqfr(qs, A, method = "butler")   # saddlepoint approximation (2nd order)
pqfr(qs, A, method = "butler", order_spa = 1) # 1st order

## To see error bounds
dqfr(qs, A, return_abserr_attr = TRUE)
pqfr(qs, A, return_abserr_attr = TRUE)
qqfr(pres, A, return_abserr_attr = TRUE)

Run the code above in your browser using DataLab