OAsymTotalVarDist
Generic function for the computation of (minimal) asymmetric total variation distance of two distributions
Generic function for the computation of (minimal) asymmetric total variation distance \(d_v^\ast\) of two distributions \(P\) and \(Q\) where the distributions may be defined for an arbitrary sample space \((\Omega,{\cal A})\). This distance is defined as $$d_v^\ast(P,Q)=\min_c \int |dQ-c\,dP|$$
- Keywords
- distribution
Usage
OAsymTotalVarDist(e1, e2, ...)
# S4 method for AbscontDistribution,AbscontDistribution
OAsymTotalVarDist(e1,e2,
rel.tol = .Machine$double.eps^0.3, Ngrid = 10000,
TruncQuantile = getdistrOption("TruncQuantile"),
IQR.fac = 15, ..., diagnostic = FALSE)
# S4 method for AbscontDistribution,DiscreteDistribution
OAsymTotalVarDist(e1,e2, ...)
# S4 method for DiscreteDistribution,AbscontDistribution
OAsymTotalVarDist(e1,e2, ...)
# S4 method for DiscreteDistribution,DiscreteDistribution
OAsymTotalVarDist(e1,e2, ...)
# S4 method for numeric,DiscreteDistribution
OAsymTotalVarDist(e1, e2, ...)
# S4 method for DiscreteDistribution,numeric
OAsymTotalVarDist(e1, e2, ...)
# S4 method for numeric,AbscontDistribution
OAsymTotalVarDist(e1, e2, asis.smooth.discretize = "discretize",
n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e2),
up.discr = getUp(e2), h.smooth = getdistrExOption("hSmooth"),
rel.tol = .Machine$double.eps^0.3, Ngrid = 10000,
TruncQuantile = getdistrOption("TruncQuantile"),
IQR.fac = 15, ..., diagnostic = FALSE)
# S4 method for AbscontDistribution,numeric
OAsymTotalVarDist(e1, e2,
asis.smooth.discretize = "discretize",
n.discr = getdistrExOption("nDiscretize"), low.discr = getLow(e1),
up.discr = getUp(e1), h.smooth = getdistrExOption("hSmooth"),
rel.tol = .Machine$double.eps^0.3, Ngrid = 10000,
TruncQuantile = getdistrOption("TruncQuantile"),
IQR.fac = 15, ..., diagnostic = FALSE)
# S4 method for AcDcLcDistribution,AcDcLcDistribution
OAsymTotalVarDist(e1, e2,
rel.tol = .Machine$double.eps^0.3, Ngrid = 10000,
TruncQuantile = getdistrOption("TruncQuantile"),
IQR.fac = 15, ..., diagnostic = FALSE)
Arguments
- e1
object of class
"Distribution"
or"numeric"
- e2
object of class
"Distribution"
or"numeric"
- asis.smooth.discretize
possible methods are
"asis"
,"smooth"
and"discretize"
. Default is"discretize"
.- n.discr
if
asis.smooth.discretize
is equal to"discretize"
one has to specify the number of lattice points used to discretize the abs. cont. distribution.- low.discr
if
asis.smooth.discretize
is equal to"discretize"
one has to specify the lower end point of the lattice used to discretize the abs. cont. distribution.- up.discr
if
asis.smooth.discretize
is equal to"discretize"
one has to specify the upper end point of the lattice used to discretize the abs. cont. distribution.- h.smooth
if
asis.smooth.discretize
is equal to"smooth"
-- i.e., the empirical distribution of the provided data should be smoothed -- one has to specify this parameter.- rel.tol
relative tolerance for
distrExIntegrate
anduniroot
- Ngrid
How many grid points are to be evaluated to determine the range of the likelihood ratio?
- TruncQuantile
Quantile the quantile based integration bounds (see details)
- IQR.fac
Factor for the scale based integration bounds (see details)
- …
further arguments to be used in particular methods -- (in package distrEx: just used for distributions with a.c. parts, where it is used to pass on arguments to
distrExIntegrate
).- diagnostic
logical; if
TRUE
, the return value obtains an attribute"diagnostic"
with diagnostic information on the integration, i.e., a list with entriesmethod
("integrate"
or"GLIntegrate"
),call
,result
(the complete return value of the method),args
(the args with which the method was called), andtime
(the time to compute the integral).
Details
For distances between absolutely continuous distributions, we use numerical
integration; to determine sensible bounds we proceed as follows:
by means of min(getLow(e1,eps=TruncQuantile),getLow(e2,eps=TruncQuantile))
,
max(getUp(e1,eps=TruncQuantile),getUp(e2,eps=TruncQuantile))
we determine
quantile based bounds c(low.0,up.0)
, and by means of
s1 <- max(IQR(e1),IQR(e2));
m1<- median(e1);
m2 <- median(e2)
and low.1 <- min(m1,m2)-s1*IQR.fac
, up.1 <- max(m1,m2)+s1*IQR.fac
we determine scale based bounds; these are combined by
low <- max(low.0,low.1)
, up <- max(up.0,up1)
.
Again in the absolutely continuous case, to determine the range of the
likelihood ratio, we evaluate this ratio on a grid constructed as follows:
x.range <- c(seq(low, up, length=Ngrid/3),
q.l(e1)(seq(0,1,length=Ngrid/3)*.999),
q.l(e2)(seq(0,1,length=Ngrid/3)*.999))
Finally, for both discrete and absolutely continuous case,
we clip this ratio downwards by 1e-10
and upwards by 1e10
In case we want to compute the total variation distance between (empirical) data
and an abs. cont. distribution, we can specify the parameter asis.smooth.discretize
to avoid trivial distances (distance = 1).
Using asis.smooth.discretize = "discretize"
, which is the default,
leads to a discretization of the provided abs. cont. distribution and
the distance is computed between the provided data and the discretized
distribution.
Using asis.smooth.discretize = "smooth"
causes smoothing of the
empirical distribution of the provided data. This is, the empirical
data is convoluted with the normal distribution Norm(mean = 0, sd = h.smooth)
which leads to an abs. cont. distribution. Afterwards the distance
between the smoothed empirical distribution and the provided abs. cont.
distribution is computed.
Diagnostics on the involved integrations are available if argument
diagnostic
is TRUE
. Then there is attribute diagnostic
attached to the return value, which may be inspected
and accessed through showDiagnostic
and
getDiagnostic
.
Value
OAsymmetric Total variation distance of e1
and e2
Methods
- e1 = "AbscontDistribution", e2 = "AbscontDistribution":
total variation distance of two absolutely continuous univariate distributions which is computed using
distrExIntegrate
.- e1 = "AbscontDistribution", e2 = "DiscreteDistribution":
total variation distance of absolutely continuous and discrete univariate distributions (are mutually singular; i.e., have distance
=1
).- e1 = "DiscreteDistribution", e2 = "DiscreteDistribution":
total variation distance of two discrete univariate distributions which is computed using
support
andsum
.- e1 = "DiscreteDistribution", e2 = "AbscontDistribution":
total variation distance of discrete and absolutely continuous univariate distributions (are mutually singular; i.e., have distance
=1
).- e1 = "numeric", e2 = "DiscreteDistribution":
Total variation distance between (empirical) data and a discrete distribution.
- e1 = "DiscreteDistribution", e2 = "numeric":
Total variation distance between (empirical) data and a discrete distribution.
- e1 = "numeric", e2 = "AbscontDistribution":
Total variation distance between (empirical) data and an abs. cont. distribution.
- e1 = "AbscontDistribution", e1 = "numeric":
Total variation distance between (empirical) data and an abs. cont. distribution.
- e1 = "AcDcLcDistribution", e2 = "AcDcLcDistribution":
Total variation distance of mixed discrete and absolutely continuous univariate distributions.
References
to be filled; Agostinelli, C and Ruckdeschel, P. (2009): A simultaneous inlier and outlier model by asymmetric total variation distance.
See Also
TotalVarDist-methods
, ContaminationSize
,
KolmogorovDist
, HellingerDist
,
Distribution-class
Examples
# NOT RUN {
OAsymTotalVarDist(Norm(), UnivarMixingDistribution(Norm(1,2),Norm(0.5,3),
mixCoeff=c(0.2,0.8)))
OAsymTotalVarDist(Norm(), Td(10))
OAsymTotalVarDist(Norm(mean = 50, sd = sqrt(25)), Binom(size = 100)) # mutually singular
OAsymTotalVarDist(Pois(10), Binom(size = 20))
x <- rnorm(100)
OAsymTotalVarDist(Norm(), x)
OAsymTotalVarDist(x, Norm(), asis.smooth.discretize = "smooth")
y <- (rbinom(50, size = 20, prob = 0.5)-10)/sqrt(5)
OAsymTotalVarDist(y, Norm())
OAsymTotalVarDist(y, Norm(), asis.smooth.discretize = "smooth")
OAsymTotalVarDist(rbinom(50, size = 20, prob = 0.5), Binom(size = 20, prob = 0.5))
# }