fit.meta.GMCM: Reproducibility/meta analysis using GMCMs

Description

This function performs reproducibility (or meta) analysis using GMCM. It features various optimization routines to identify the maximum likelihood estimate of the speical Gaussian mixture copula model proposed by Li et. al. (2011).

Usage

fit.meta.GMCM(u, init.par, method = c("NM", "SANN", "L-BFGS", "L-BFGS-B",
  "PEM"), max.ite = 1000, verbose = TRUE, positive.rho = TRUE,
  trace.theta = FALSE, ...)

Arguments

An n by d matrix of test statistics. Rows correspond to features and columns to experiments. Large values are assumed to be indicative of reproducible genes.

init.par

A 4-dimensional vector of the initial parameters where, init.par[1] is the mixture proportion of spurious signals, init.par[2] is the mean, init.par[3] is the standard deviation, init.par[4] is the corre

method

A character vector of length $1$. The optimization method used. Should be either "NM", "SANN", "L-BFGS", "L-BFGS-B", or "PEM" which are abbreviations of Nelder-Mead, Simulated Annealing, lim

max.ite

The maximum number of iterations. If the method is "SANN" this is the number of interations as there is no other stopping criterion. (See optim)

verbose

Logical. If TRUE, the log likelihood values are printed.

positive.rho

Logical. If TRUE, the correlation parameter is restricted to be positive.

trace.theta

Logical. Extra convergence information is appended as a list to the output returned if TRUE. The exact behavior is dependent on the value of method. If method equals "PEM", the argument is passed to

...

Arguments passed to the control-list in optim or PseudoEMAlgorithm if method is "PEM".

Value

A vector par of length 4 of the fitted parameters where par[1] is the probability of being from the first (or null) component, par[2] is the mean, par[3] is the standard deviation, and par[4] is the correlation.

Details

The "L-BFGS-B" method does not perform a transformation of the parameters.

References

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

Examples

Run this code

set.seed(1)

# True parameters
true.par <- c(0.9, 2, 0.7, 0.6)
# Simulation of data from the GMCM model
data <- SimulateGMCMData(n = 1000, par = true.par)
uhat <- Uhat(data$u) # Ranked observed data

init.par <- c(0.5, 1, 0.5, 0.9)  # Initial parameters

# Optimization with Nelder-Mead
nm.par   <- fit.meta.GMCM(uhat, init.par = init.par, method = "NM")

# Comparison with other optimization methods
# Optimization with simulated annealing
sann.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "SANN",
                          max.ite = 3000, temp = 1)
# Optimization with the Pseudo EM algorithm
pem.par  <- fit.meta.GMCM(uhat, init.par = init.par, method = "PEM")

# The estimates agree nicely
rbind("True" = true.par, "Start" = init.par,
      "NM" = nm.par, "SANN" = sann.par, "PEM" = pem.par)

# Get estimated cluster
Khat <- get.IDR(x = uhat, par = nm.par)$Khat
plot(uhat, col = Khat, main = "Clustering\nIDR < 0.05")

Run the code above in your browser using DataLab