npMSL: Nonparametric EM-like Algorithm for Mixtures of Independent Repeated Measurements - Maximum Smoothed Likelihood version

Description

Returns nonparametric Smoothed Likelihood algorithm output (Levine et al, 2011) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn.

Usage

npMSL(x, mu0, blockid = 1:ncol(x), 
     bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, 
     h = bw, eps = 1e-8, 
     maxiter = 500, ngrid=200, verb = TRUE)

Arguments

An $n\times r$ matrix of data. Each of the $n$ rows is a case, and each case has $r$ repeated measurements. These measurements are assumed to be conditionally independent, conditional on the mixture component (subpopulation) from which the case i

mu0

Either an $m\times r$ matrix specifying the initial centers for the kmeans function, or an integer $m$ specifying the number of initial centers, which are then choosen randomly in

blockid

A vector of length $r$ identifying coordinates (columns of x) that are assumed to be identically distributed (i.e., in the same block). For instance, the default has all distinct elements, indicating that no two coordinates are

Bandwidth for density estimation, equal to the standard deviation of the kernel density. By default, a simplistic application of the default bw.nrd0 bandwidth used by

samebw

Logical: If TRUE, use the same bandwidth for each iteration and for each component and block. If FALSE, use a separate bandwidth for each component and block, and update this bandwidth at each iteration of the algorithm

Alternative way to specify the bandwidth, to provide backward compatibility.

eps

Tolerance limit for declaring algorithm convergence. Convergence is declared whenever the maximum change in any coordinate of the lambda vector (of mixing proportion estimates) does not exceed eps.

maxiter

The maximum number of iterations allowed, convergence may be declared before maxiter iterations (see eps above).

ngrid

Number of points in the discretization of the intervals over which are approximated the (univariate) integrals for non linear smoothing of the log-densities, as required in the E step of the npMSL algorithm, see Levine et al (2011).

verb

If TRUE, print updates for every iteration of the algorithm as it runs

Value

npMSL returns a list of class npEM with the following items:
dataThe raw data (an $n\times r$ matrix).
posteriorsAn $n\times m$ matrix of posterior probabilities for observation.
bandwidthIf samebw==TRUE, same as the bw input argument; otherwise, value of bw matrix at final iteration. This information is needed by any method that produces density estimates from the output.
blockidSame as the blockid input argument, but recoded to have positive integer values. Also needed by any method that produces density estimates from the output.
lambdaThe sequence of mixing proportions over iterations.
lambdahatThe final mixing proportions.
loglikThe sequence of log-likelihoods over iterations.
fAn array of size $ngrid \times m \times l$, returning last values of density for component $j$ and block $k$ over grid points.
meanNaNAverage number of NaN that occured over iterations (for internal testing and control purpose).
meanUdflAverage number of "underflow" that occured over iterations (for internal testing and control purpose).

References

Levine, M., Hunter, D. and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika 98(2): 403-416.
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009a), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009b), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures, Technical Report.
Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.

Examples

Run this code

## Examine and plot water-level task data set.
## Block structure pairing clock angles that are directly opposite one
## another (1:00 with 7:00, 2:00 with 8:00, etc.)
set.seed(111) # Ensure that results are exactly reproducible
data(Waterdata)
blockid <- c(4,3,2,1,3,4,1,2) # see Benaglia et al (2009a)

a <- npEM(Waterdata, mu0=3, blockid=blockid, bw=4)  # npEM solution
b <- npMSL(Waterdata, mu0=3, blockid=blockid, bw=4) # smoothed version

# Comparisons on the 4 default plots, one for each block
par(mfrow=c(2,2))
for (l in 1:4){
plot(a, blocks=l, breaks=5*(0:37)-92.5,
	xlim=c(-90,90), xaxt="n",ylim=c(0,.035), xlab="")
plot(b, blocks=l, hist=FALSE, newplot=FALSE, addlegend=FALSE, lty=2,
	dens.col=1)
axis(1, at=30*(1:7)-120, cex.axis=1)
legend("topleft",c("npMSL"),lty=2, lwd=2)}

Run the code above in your browser using DataLab

Data Engineering and BI courses are free this week!