npmle: Maximum Likelihood Estimate of a Mixing Distribution.

Description

Estimates the mixture distribution nonparametrically using an EM algorithm. The estimate is discrete with the results being returned as a vector of support points and a vector of associated mixture probabilities. The available choices for the sampling distribution include: Normal, Poisson, Binomial and t-distributions.

Usage

npmle(data, family = gaussian, maxiter = 500, tol = 1e-4,
      smooth = TRUE, bass = 0, nmix = NULL)

Arguments

data

A data frame or a matrix with the number of rows equal to the number of sampling units. The first column should contain the main estimates, and the second column should contain the nuisance terms.

family

family determining the sampling distribution (see family)

maxiter

the maximum number of EM iterations

tol

the convergence tolerance

smooth

logical; whether or not to smooth the estimated cdf

bass

controls the smoothness level; only relevant if smooth=TRUE. Values of up to 10 indicate increasing smoothness.

nmix

optional; the number of mixture components

Value

An object of class npmix which is a list containing at least the following components

support

a vector of estimated support points

mix.prop

a vector of estimated mixture proportions

Fhat

a function; obtained through interpolation of the estimated discrete cdf

fhat

a function; estimate of the mixture density

loglik

value of the log-likelihood at each iteration

convergence

0 indicates convergence; 1 indicates that convergence was not achieved

numiter

the number of EM iterations required

Details

Assuming the following two-level sampling model $ X_i|\theta_i$ ~ $p(x|\theta_i,\eta_i)$ and $\theta_i$ ~ $F$ for $i = 1,...,n$. The function npmle seeks to find an estimate of the mixing distribution $F$ which maximizes the marginal log-likelihood $$ l(F) = \sum_i \int p( X_i |\theta, \eta_i) dF(\theta). $$ The distribution function maximizing $l(F)$ is known to be discrete; and thus, the estimated mixture distribution is returned as a set of support points and associated mixture probabilities.

References

Laird, N.M. (1978), Nonparametric maximum likelihood estimation of a mixing distribution, Journal of the American Statistical Association, 73, 805--811.

Lindsay, B.G. (1983), The geometry of mixture likelihoods: a general theory. The Annals of Statistics, 11, 86--94

Examples

Run this code

# NOT RUN {
data(hiv)
npobj <- npmle(hiv, family = tdist(df=6), maxiter = 25)


###  Generate Binomial data with Beta mixing distribution
n <- 3000
theta <- rbeta(n, shape1 = 2, shape2 = 10)
ntrials <- rpois(n, lambda = 10)
x <- rbinom(n, size = ntrials, prob = theta)

###  Estimate mixing distribution 
dd <- cbind(x,ntrials)
npest <- npmle(dd, family = binomial, maxiter = 25)

### compare with true mixture cdf
tt <- seq(1e-4,1 - 1e-4, by = .001)
plot(npest, lwd = 2)
lines(tt, pbeta(tt, shape1 = 2, shape2 = 10), lwd = 2, lty = 2)
# }

Run the code above in your browser using DataLab