TotalVarDist: Generic function for the computation of the total variation distance of two distributions

Description

Generic function for the computation of the total variation distance $d_v$ of two distributions $P$ and $Q$ where the distributions may be defined for an arbitrary sample space $(\Omega,{\cal A})$. The total variation distance is defined as $$d_v(P,Q)=\sup_{B\in{\cal A}}|P(B)-Q(B)|$$

Usage

TotalVarDist(e1, e2, ...)
## S3 method for class 'AbscontDistribution,AbscontDistribution':
TotalVarDist(e1,e2, 
                        rel.tol=.Machine$double.eps^0.3, 
                        TruncQuantile = getdistrOption("TruncQuantile"), 
                        IQR.fac = 15, ...)
## S3 method for class 'AbscontDistribution,DiscreteDistribution':
TotalVarDist(e1,e2, ...)
## S3 method for class 'DiscreteDistribution,AbscontDistribution':
TotalVarDist(e1,e2, ...)
## S3 method for class 'DiscreteDistribution,DiscreteDistribution':
TotalVarDist(e1,e2, ...)
## S3 method for class 'numeric,DiscreteDistribution':
TotalVarDist(e1, e2, ...)
## S3 method for class 'DiscreteDistribution,numeric':
TotalVarDist(e1, e2, ...)
## S3 method for class 'numeric,AbscontDistribution':
TotalVarDist(e1, e2, asis.smooth.discretize = "discretize", 
            n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e2),
            up.discr = getUp(e2), h.smooth = getdistrExOption("hSmooth"),
            rel.tol = .Machine$double.eps^0.3, 
            TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ...)
## S3 method for class 'AbscontDistribution,numeric':
TotalVarDist(e1, e2, asis.smooth.discretize = "discretize", 
            n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e1),
            up.discr = getUp(e1), h.smooth = getdistrExOption("hSmooth"),
            rel.tol = .Machine$double.eps^0.3, 
            TruncQuantile = getdistrOption("TruncQuantile"), IQR.fac = 15, ...)
## S3 method for class 'AcDcLcDistribution,AcDcLcDistribution':
TotalVarDist(e1, e2,                         
                        rel.tol = .Machine$double.eps^0.3, 
                        TruncQuantile = getdistrOption("TruncQuantile"), 
                        IQR.fac = 15, ...)

Arguments

object of class "Distribution" or "numeric"

asis.smooth.discretize

possible methods are "asis", "smooth" and "discretize". Default is "discretize".

n.discr

if asis.smooth.discretize is equal to "discretize" one has to specify the number of lattice points used to discretize the abs. cont. distribution.

low.discr

if asis.smooth.discretize is equal to "discretize" one has to specify the lower end point of the lattice used to discretize the abs. cont. distribution.

up.discr

if asis.smooth.discretize is equal to "discretize" one has to specify the upper end point of the lattice used to discretize the abs. cont. distribution.

h.smooth

if asis.smooth.discretize is equal to "smooth" -- i.e., the empirical distribution of the provided data should be smoothed -- one has to specify this parameter.

rel.tol

relative accuracy requested in integration

TruncQuantile

Quantile the quantile based integration bounds (see details)

IQR.fac

Factor for the scale based integration bounds (see details)

...

further arguments to be used in particular methods (not in package distrEx)

Value

Total variation distance of e1 and e2

concept

distance

Details

For distances between absolutely continuous distributions, we use numerical integration; to determine sensible bounds we proceed as follows: by means of min(getLow(e1,eps=TruncQuantile),getLow(e2,eps=TruncQuantile)), max(getUp(e1,eps=TruncQuantile),getUp(e2,eps=TruncQuantile)) we determine quantile based bounds c(low.0,up.0), and by means of s1 <- max(IQR(e1),IQR(e2)); m1<- median(e1); m2 <- median(e2) and low.1 <- min(m1,m2)-s1*IQR.fac, up.1 <- max(m1,m2)+s1*IQR.fac we determine scale based bounds; these are combined by low <- max(low.0,low.1), up <- max(up.0,up1). In case we want to compute the total variation distance between (empirical) data and an abs. cont. distribution, we can specify the parameter asis.smooth.discretize to avoid trivial distances (distance = 1). Using asis.smooth.discretize = "discretize", which is the default, leads to a discretization of the provided abs. cont. distribution and the distance is computed between the provided data and the discretized distribution. Using asis.smooth.discretize = "smooth" causes smoothing of the empirical distribution of the provided data. This is, the empirical data is convoluted with the normal distribution Norm(mean = 0, sd = h.smooth) which leads to an abs. cont. distribution. Afterwards the distance between the smoothed empirical distribution and the provided abs. cont. distribution is computed.

References

Huber, P.J. (1981) Robust Statistics. New York: Wiley. Rieder, H. (1994) Robust Asymptotic Statistics. New York: Springer.

Examples

Run this code

TotalVarDist(Norm(), Gumbel())
TotalVarDist(Norm(), Td(10))
TotalVarDist(Norm(mean = 50, sd = sqrt(25)), Binom(size = 100)) # mutually singular
TotalVarDist(Pois(10), Binom(size = 20)) 

x <- rnorm(100)
TotalVarDist(Norm(), x)
TotalVarDist(x, Norm(), asis.smooth.discretize = "smooth")

y <- (rbinom(50, size = 20, prob = 0.5)-10)/sqrt(5)
TotalVarDist(y, Norm())
TotalVarDist(y, Norm(), asis.smooth.discretize = "smooth")

TotalVarDist(rbinom(50, size = 20, prob = 0.5), Binom(size = 20, prob = 0.5))

Run the code above in your browser using DataLab