Learn R Programming

IdMappingAnalysis (version 1.16.0)

fit2clusters: Flexible two-cluster mixture fit of a numeric vector

Description

fit2clusters uses an ECM algorithm to fit a two-component mixture model. It is more flexible than mclust in some ways, but it only deals with one-dimensional data.

Usage

fit2clusters(Y, Ylabel = "correlation", Ysigsq, piStart = c(0.5, 0.5), VStart = c(0.1, 0.1), psiStart = c(0, 0.1), NinnerLoop = 1, nReps = 500, psi0Constraint, V0Constraint, sameV = FALSE, estimatesOnly = TRUE, plotMe = TRUE, testMe = FALSE, Ntest = 5000, simPsi = c(0, 0.4), simPi = c(2/3, 1/3), simV = c(0.05^2, 0.05^2), simAlpha = 5, simBeta = 400, seed, ...)

Arguments

Y
The vector of numbers to fit.
Ysigsq
The vector of variance estimates for Y.
Ylabel
Label for the Y axis in a density fit figure.
piStart
Starting values for the component proportions.
VStart
Starting values for the component variances.
psiStart
Starting values for the component means
NinnerLoop
Number of iterations in the "C" loop of ECM.
nReps
Upper limit of number of EM steps.
psi0Constraint
If not missing, a fixed value for the first component mean.
V0Constraint
If not missing, a fixed value for the first component variance.
sameV
If TRUE, the components have the same variance.
estimatesOnly
If TRUE, return only the estimates. Otherwise, returns details per observations, and return the estimates as an attribute.
plotMe
If TRUE, plot the mixture density and kernel smooth estimates.
testMe
If TRUE, run a code test.
Ntest
For testing purposes, the number of replications of simulated data.
simPsi
For testing purposes, the true means.
simPi
For testing purposes, the true proportions
simV
For testing purposes, the true variances.
simAlpha
For testing purposes, alpha parameter in rgamma for measurement error variance.
simBeta
For testing purposes, beta parameter in rgamma for measurement error variance.
seed
For testing purposes, random seed.
...
Not used; testing roxygen2.

Value

If estimatesOnly is TRUE, return only the estimates: Otherwise, return a dataframe of details per observations, and return the estimates as an attribute. The estimates details are:
pi1
The probability of the 2nd mixture component
psi0
The mean of the first component (psi0Constraint if provided)
psi1
The mean of the second component
Var0
The variance of the first component (V0Constraint if provided)
Var1
The variance of the second component
The observations details are:
Y
The original observations.
Ysigsq
The original measurement variances.
posteriorOdds
Posterior odds of being in component 2 of the mixture.
postProbVar
Estimated variance of the posterior probability, using the delta method.

Details

See the document "ECM_algorithm_for_two_clusters.pdf".