Learn R Programming

GMCM (version 1.1.1)

PseudoEMAlgorithm: EM-like algorithm for the GMCM

Description

An fast and modified implementation of the Li et. al. (2011) EM-like algorithm for estimating the maximizing parameters of the GMCM-likelihood function.

Usage

PseudoEMAlgorithm(x, theta, eps = 1e-04, max.ite = 1000, verbose = FALSE,
  trace.theta = FALSE, meta.special.case = FALSE,
  convergence.criterion = c("absGMCM", "GMCM", "GMM", "Li", "absLi"))

Arguments

x
A matrix of observations where rows corresponds to features and columns to experiments.
theta
A list of parameters formatted as described in rtheta.
eps
The maximum difference required to achive convergence.
max.ite
The maximum number of iterations.
verbose
Logical. Set to TRUE to increase verbosity.
trace.theta
Logical. If TRUE, a trace of the estimated thetas are returned.
meta.special.case
Logical. If TRUE, the esimators used are for the special case GMCM of Li et. al. (2011).
convergence.criterion
Character. Sets the convergence criterion. If "absGMCM" the absolute value of difference in GMCM is used. If "GMCM" the difference in GMCM-likelihoods are used as convergence criterion. If "GMM", the guaranteed non-

Value

  • A list of 3 or 4 is returned depending on the value of trace.theta
  • thetaA list containing the final parameter estimate in the format of rtheta
  • loglik.trA matrix with different log-likelihood traces in each row.
  • kappaA matrix where the (i,j)'th entry is the probability that x[i, ] belongs to the j'th component. Usually the returned value of EStep.
  • theta.trA list of each obtained parameter estimates in the format of rtheta

Details

When either "absGMCM" or "absLi" are used, the parameters corresponding to the biggest observed likelihood is returned. This is not necessarily the last iteration.

References

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

See Also

rtheta, EMAlgorithm

Examples

Run this code
set.seed(1)

# Choosing the true parameters and simulating data
true.par <- c(0.8, 3, 1.5, 0.4)
data <- SimulateGMCMData(n = 1000, par = true.par, d = 2)
uhat <- Uhat(data$u)  # Observed ranks

# Plot of latent and observed data colour coded by the true component
par(mfrow = c(1,2))
plot(data$z, main = "Latent data", cex = 0.6,
     xlab = "z (Experiment 1)", ylab = "z (Experiment 2)",
     col = c("red","blue")[data$K])
plot(uhat, main = "Observed data", cex = 0.6,
     xlab = "u (Experiment 1)", ylab = "u (Experiment 2)",
     col = c("red","blue")[data$K])

# Fit the model using the Pseudo EM algorithm
init.par <- c(0.5, 1, 1, 0.5)
res <- GMCM:::PseudoEMAlgorithm(uhat, meta2full(init.par, d = 2),
                                verbose = TRUE,
                                convergence.criterion = "absGMCM",
                                eps = 1e-4,
                                trace.theta = FALSE,
                                meta.special.case = TRUE)

# Compute posterior cluster probabilities
IDRs <- get.IDR(uhat, par = full2meta(res$theta))

# Plot of observed data colour coded by the MAP estimate
plot(res$loglik[3,], main = "Loglikelihood trace", type = "l",
     ylab = "log GMCM likelihood")
abline(v = which.max(res$loglik[3,])) # Chosen MLE
plot(uhat, main = "Clustering\nIDR < 0.05", xlab = "", ylab = "", cex = 0.6,
     col = c("Red","Blue")[IDRs$Khat])

# View parameters
rbind(init.par, true.par, estimate = full2meta(res$theta))

# Confusion matrix
table("Khat" = IDRs$Khat, "K" = data$K)

Run the code above in your browser using DataLab