
Last chance! 50% off unlimited learning
Sale ends in
Returns nonparametric Smoothed Likelihood algorithm output (Levine et al, 2011) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn.
npMSL(x, mu0, blockid = 1:ncol(x),
bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE,
bwmethod = "S", h = bw, eps = 1e-8,
maxiter=500, bwiter = maxiter, nbfold = NULL,
ngrid=200, post=NULL, verb = TRUE)
npMSL
returns a list of class npEM
with the following items:
The raw data (an
An
If samebw==TRUE
,
same as the bw
input argument; otherwise, value of bw
matrix
at final iteration. This
information is needed by any method that produces density estimates from the
output.
Same as the blockid
input argument, but recoded to have
positive integer values. Also needed by any method that produces density
estimates from the output.
The sequence of mixing proportions over iterations.
The final mixing proportions.
The sequence of log-likelihoods over iterations.
An array of size grid
points.
Average number of NaN
that occured over iterations (for internal testing and control purpose).
Average number of "underflow" that occured over iterations (for internal testing and control purpose).
An
Either an
A vector of length x
) that are
assumed to be identically distributed (i.e., in the same block).
For instance,
the default has all distinct elements, indicating that no two coordinates
are assumed identically distributed and thus a separate set of blockid=rep(1,ncol(x))
, then the coordinates in each row
are assumed conditionally i.i.d.
Bandwidth for density estimation, equal to the standard deviation
of the kernel density. By default, a simplistic application of the
default bw.nrd0
bandwidth used by density
to the entire dataset.
Logical: If TRUE
, use the same bandwidth for
each iteration and for each component and block. If FALSE
,
use a separate bandwidth for each component and block, and update
this bandwidth at each iteration of the algorithm
until bwiter
is reached (see below). Two adaptation methods are provided,
see bwmethod
below.
Define the adaptive bandwidth strategy when samebw = FALSE
, in which case
the bandwidth depends on each component, block, and iteration of the algorithm.
If set to "S" (the default), adaptation is done using a suitably
modified bw.nrd0
method as described in
Benaglia et al (2011).
If set to "CV", an adaptive nbfold
is the number of subsamples.
This corresponds to a Leave-
Alternative way to specify the bandwidth, to provide backward compatibility.
Tolerance limit for declaring algorithm convergence. Convergence
is declared whenever the maximum change in any coordinate of the
lambda
vector (of mixing proportion estimates) does not exceed
eps
.
The maximum number of iterations allowed, convergence
may be declared before maxiter
iterations (see eps
above).
The maximum number of iterations allowed for adaptive bandwidth stage,
when samebw = FALSE
. If set to 0
, then the initial bandwidth matrix is used without adaptation.
A parameter passed to the internal function wbs.kCV
, which controls the weighted bandwidth selection by k-fold cross-validation.
Number of points in the discretization of the intervals over which are approximated the (univariate) integrals for non linear smoothing of the log-densities, as required in the E step of the npMSL algorithm, see Levine et al (2011).
If non-NULL, an
If TRUE, print updates for every iteration of the algorithm as it runs
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D. and Hunter, D.R. (2011), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27.
Chauveau D., Hunter D. R. and Levine M. (2014), Semi-Parametric Estimation for Conditional Independence Multivariate Finite Mixture Models. Preprint (under revision).
Levine, M., Hunter, D. and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika 98(2): 403-416.
npEM
, plot.npEM
,
normmixrm.sim
, spEMsymloc
,
spEM
, plotseq.npEM
## Examine and plot water-level task data set.
## Block structure pairing clock angles that are directly opposite one
## another (1:00 with 7:00, 2:00 with 8:00, etc.)
set.seed(111) # Ensure that results are exactly reproducible
data(Waterdata)
blockid <- c(4,3,2,1,3,4,1,2) # see Benaglia et al (2009a)
if (FALSE) {
a <- npEM(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # npEM solution
b <- npMSL(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # smoothed version
# Comparisons on the 4 default plots, one for each block
par(mfrow=c(2,2))
for (l in 1:4){
plot(a, blocks=l, breaks=5*(0:37)-92.5,
xlim=c(-90,90), xaxt="n",ylim=c(0,.035), xlab="")
plot(b, blocks=l, hist=FALSE, newplot=FALSE, addlegend=FALSE, lty=2,
dens.col=1)
axis(1, at=30*(1:7)-120, cex.axis=1)
legend("topleft",c("npMSL"),lty=2, lwd=2)}
}
Run the code above in your browser using DataLab