essHist (version 1.2.2)

checkHistogram: Check any histogram estimator by means of the multiscale confidence set

Description

Provide the locations, i.e., intervals, where features are potentially missing (a.k.a. false negatives), and the break-points that are potentially redundant (a.k.a. false positives), by means of the multiscale confidence set.

Usage

checkHistogram(h, x, alpha = 0.1, q = NULL, intv = NULL, 
                mode = ifelse(anyDuplicated(x),"Gen","Con"), 
                plot = TRUE, xlim = NULL, ylim = NULL, 
                xlab = "", ylab = "", yaxt = "n", ...)

Arguments

h

a numeric vector specifying values of a histogram at sample points; or a hitogram class object (i.e. the return value of hist).

x

a numeric vector containing the data.

alpha

significance level, default as 0.1, see also essHistogram.

q

threshold of the multiscale constraint; by default, q is chosen as the (1-alpha)-quantile of the null distribution of the multiscale statistic via Monte Carlo simulation, see also msQuantile.

intv

a data frame provides the system of intervals on which the multiscale statistic is defined. The data frame constains the following two columns

left left index of an interval

right right index of an interval

By default, it is set to the sparse interval system proposed by Rivera and Walther (2013), see also Li et al. (2016).

mode

"Con" for continuous distribution functions

"Gen" for general (possibly with discontinuous) distribution functions

By default, "Con" is chosen if there is no tied observations; otherwise, "Gen" is chosen; see Li et al. (2016) for further details.

plot

logical. If TRUE, the input estimator is potted, together with evaluation information. More precisely, at the very bottom, intervals where local constaints are violated are plotted. In the middle short vertical lines that indicate possibly removable change-points are drawn above a light blue horizontal line. Right below the light blue line, it plots a horizontal gray scale strap, the darkness of which reflects the number of violation intervals covering a given location, as a summary of violation information.

xlim, ylim

numeric vectors of length 2 (default xlim = range(y), ylim = NULL): see plot.

xlab

a title for the x axis (default empty string): see title and plot.

ylab

a title for the y axis (default empty string): see title and plot.

yaxt

A character which specifies the y axis type (default "n"): see par.

...

further arguments and graphical parameters passed to plot (if plot = TRUE).

Value

A list consists of one data frame, and one numeric vector:

violatedIntervals

A data frame provides the intervals where the corresponding local side constraint is violated; an empty data frame if there is no violation. It constains the following four columns

leftIndex left index of an interval

rightIndex right index of an interval

leftEnd left end point of an interval

rightEnd right end point of an interval

An empty data.frame is returned if there is no violation.

removableBreakpoints

A numeric vector contains all removable breakpoints, with zero length if there is no removable breakpoint.

Details

This function presents a visualization: the upper part plots the given histogram; in the middle part short vertical lines mark all removable break-points; in the lower part intervals of violation are shown, and a graybar below the middle horizontal line (blue) sumarizes such violations with the darkness scaling with the number of violation intervals covering a location. See Examples below and Li et al. (2016) for further details.

References

Li, H., Munk, A., Sieling, H., and Walther, G. (2016). The essential histogram. arXiv:1612.07216.

See Also

essHistogram, genIntv, msQuantile

Examples

Run this code
# NOT RUN {
set.seed(123)
# Data: mixture of Gaussians "harp"
n = 500
y = rmixnorm(n, type = 'harp')

# Oracle density
x = sort(y)
ho = dmixnorm(x, type = 'harp')

# R default histogram
h  = hist(y, plot = FALSE)

# Check R default histogram to local multiscale constriants
b = checkHistogram(h, y, ylim=c(-0.1,0.16))
lines(x, ho, col = "red")
rug(x, col = 'blue')
legend("topright", c("R-Histogram", "Truth"), col = c("black", "red"), lty = c(1,1))
# }

Run the code above in your browser using DataLab