excessmass: Excess mass

Description

This function computes the excess mass statistic.

Usage

excessmass(data,mod0=1,approximate=FALSE,gridsize=NULL,full.result=F)

Arguments

data

Sample for computing the excess mass.

mod0

Number of modes for which the excess mass is calculated. Default mod0=1.

approximate

If this argument is TRUE then the excess mass value is approximated. Default approximate=FALSE.

gridsize

When approximate=TRUE, number of endpoints at which the \(C_m(\lambda)\) sets are estimated (first element) and number of possible values of \(\lambda\) (second element). Default is gridsize=c(20,20).

full.result

If this argument is TRUE then it returns the full result list, see below. Default full.result=FALSE.

Value

Depending on full.result either a number, the excess mass statistic for mod0 modes, or an object of class "estmod" which is a list containing the following components:

nmodes

The specified hypothesized value of the number of modes.

sample.size

The number of non-missing observations in the sample used for computing the excess mass.

excess.mass

Value of the excess mass test statistic.

approximate

A logical value indicating if the excess mass was approximated.

Details

With excessmass, the excess mass test statistic, introduced by M<U+00FC>ller and Sawitzki (1991), for the integer number of modes specified in mod0 is computed.

The excess mass test statistic for k modes is defined as \(\max_{\lambda} \{D_{n,k+1}(\lambda)\}\), where \(D_{n,k+1}(\lambda)=(E_{n,k+1}(P_n,\lambda)-E_{n,k}(P_n,\lambda))\). The empirical excess mass function for \(k\) modes is defined as \(E_{n,k}(P_n,\lambda)=\sup_{C_1(\lambda),\ldots,C_k(\lambda)} \{\sum_{m=1}^k P_n (C_m(\lambda)) - \lambda ||C_m(\lambda)|| \}\), being the sets \(C_m(\lambda)\) closed intervals with endpoints the data points.

When mod0>1 and the sample size is large, a two--steps approximation (approximate=TRUE) can be performed in order to improve the computing time efficiency. First, since the possible \(\lambda\) candidates to maximize \(D_{n,k+1}(\lambda)\) can be directly obtained from the sets that maximize \(E_{n,k+1}\) and \(E_{n,k}\) (see Appendix E in Ameijeiras--Alonso et al., 2016), the possible values of \(\lambda\) are computed by looking to the empirical excess mass function in gridsize[1] endpoints candidates for \(C_m(\lambda)\) and also in the \(\lambda\) values associated to the empirical excess mass for one mode. Once a \(\lambda\) maximizing the approximated values of \(D_{n,k+1}(\lambda)\) is chosen, in order to obtain the approximation of the excess mass test statistic, in its neighborhood, a grid of possible values of \(\lambda\) is created, being its length equal to gridsize[2], and the exact value of \(D_{n,k+1}(\lambda)\) is calculated for these values of \(\lambda\) (using the algorithm proposed by M<U+00FC>ller and Sawitzki, 1991).

If there are repeated data in the sample or the distance between different pairs of data points shows ties, a data perturbation is applied. This modification is made in order to avoid the discretization of the data which has important effects on the computation of the test statistic. The perturbed sample is obtained by adding a sample from the uniform distribution in minus/plus a half of the minimum of the positive distances between two sample points.

The NAs will be automatically removed.

References

Jose Ameijeiras--Alonso, Rosa M. Crujeiras, Alberto Rodr<U+00ED>guez--Casal (2016). Mode testing, critical bandwidth and excess mass, arXiv preprint: 1609.05188.

M<U+00FC>ller, D. W. and Sawitzki, G. (1991). Excess mass estimates and tests for multimodality, The Annals of Statistics, 13, 70--84.

Examples

Run this code

# NOT RUN {
# Excess mass statistic for one mode
set.seed(2016)
data=rnorm(50)
excessmass(data)
# }

Run the code above in your browser using DataLab