mable.mvar: Maximum Approximate Bernstein Likelihood Estimate of Multivariate Density Function

Description

Maximum Approximate Bernstein Likelihood Estimate of Multivariate Density Function

Usage

mable.mvar(
  x,
  M0 = 1,
  M,
  search = TRUE,
  interval = NULL,
  mar.deg = TRUE,
  high.dim = FALSE,
  criterion = c("cdf", "pdf"),
  controls = mable.ctrl(),
  progress = TRUE
)

Value

A list with components

m a vector of the selected optimal degrees by the method of change-point
p a vector of the mixture proportions \(p(j_1, \ldots, j_d)\), arranged in the column-major order of \(j = (j_1, \ldots, j_d)\), \(0 \le j_i \le m_i, i = 1, \ldots, d\).
mloglik the maximum log-likelihood at an optimal degree m
pval the p-values of change-points for choosing the optimal degrees for the marginal densities
M the vector (m1, m2, ... , md), where mi is the largest candidate degree when the search stoped for the i-th marginal density
interval support hyperrectangle \([a, b]=[a_1, b_1] \times \cdots \times [a_d, b_d]\)
convergence An integer code. 0 indicates successful completion(the EM iteration is convergent). 1 indicates that the iteration limit maxit had been reached in the EM iteration;

Arguments

x: an n x d matrix or data.frame of multivariate sample of size n
M0: a positive integer or a vector of d positive integers specify starting candidate degrees for searching optimal degrees.
M: a positive integer or a vector of d positive integers specify the maximum candidate or the given model degrees for the joint density.
search: logical, whether to search optimal degrees between M0 and M or not but use M as the given model degrees for the joint density.
interval: a vector of two endpoints or a 2 x d matrix, each column containing the endpoints of support/truncation interval for each marginal density. If missing, the i-th column is assigned as c(min(x[,i]), max(x[,i])).
mar.deg: logical, if TRUE, the optimal degrees are selected based on marginal data, otherwise, the optimal degrees are those minimize the maximum L2 distance between marginal cdf or pdf estimated based on marginal data and the joint data. See details.
high.dim: logical, data are high dimensional/large sample or not if TRUE, run a slower version procedure which requires less memory
criterion: either cdf or pdf should be used for selecting optimal degrees. Default is "cdf"
controls: Object of class mable.ctrl() specifying iteration limit and the convergence criterion eps. Default is mable.ctrl. See Details.
progress: if TRUE a text progressbar is displayed

Author

Zhong Guan <zguan@iusb.edu>

Details

A \(d\)-variate density \(f\) on a hyperrectangle \([a, b] =[a_1, b_1] \times \cdots \times [a_d, b_d]\) can be approximated by a mixture of \(d\)-variate beta densities on \([a, b]\), \(\beta_{mj}(x) = \prod_{i=1}^d\beta_{m_i,j_i}[(x_i-a_i)/(b_i-a_i)]/(b_i-a_i)\), with proportion \(p(j_1, \ldots, j_d)\), \(0 \le j_i \le m_i, i = 1, \ldots, d\). Let \(\tilde F_i\) (\(\tilde f_i\)) be an estimate with degree \(\tilde m_i\) of the i-th marginal cdf (pdf) based on marginal data x[,i], \(i=1, \ldots, d\). If search=TRUE and use.marginal=TRUE, then the optimal degrees are \((\tilde m_1,\ldots,\tilde m_d)\). If search=TRUE and use.marginal=FALSE, then the optimal degrees \((\hat m_1,\ldots,\hat m_d)\) are those that minimize the maximum of \(L_2\)-distance between \(\tilde F_i\) (\(\tilde f_i\)) and the estimate of \(F_i\) (\(f_i\)) based on the joint data with degrees \(m=(m_1,\ldots,m_d)\) for all \(m\) between \(M_0\) and \(M\) if criterion="cdf" (criterion="pdf").

For large data and multimodal density, the search for the model degrees is very time-consuming. In this case, it is suggested that the degrees are selected based on marginal data using mable or optimable.

References

Wang, T. and Guan, Z.,(2019) Bernstein Polynomial Model for Nonparametric Multivariate Density, Statistics, Vol. 53, no. 2, 321-338

Examples

Run this code

## Old Faithful Data
# \donttest{
 a<-c(0, 40); b<-c(7, 110)
 ans<- mable.mvar(faithful, M = c(46,19), search =FALSE, 
         interval = rbind(a,b), progress=FALSE)
 plot(ans, which="density") 
 plot(ans, which="cumulative")
# }

Run the code above in your browser using DataLab