Learn R Programming

Clomial (version 1.8.0)

Clomial: Fits several binomial models to data from multiple samples of a single tumor.

Description

Using EM, trains several models using different initial values to escape from local optima. The best one in terms of the likelihood can be later chosen by choose.best() function.

Usage

Clomial(Dt = NULL, Dc = NULL, DcDtFile = NULL, C, doParal=FALSE, outPrefix = NULL, binomTryNum = 1000, maxIt = 100, llCutoff = 0.001, jobNamePrefix = "Bi", qstatWait = 2, fitBinomJobFile = NULL, jobShare = 10, ignoredSample = c(), fliProb=0.05, conservative=TRUE, doTalk=FALSE)

Arguments

Dt
A matrix which contains the counts of the alternative allele where rows correspond to the genomic loci, and columns correspond to the samples.
Dc
A matrix which contains the counts of the total number of mapped reads where rows correspond to the genomic loci, and columns correspond to the samples.
DcDtFile
A file from which the data can optionally be loaded. It should contain the matrices Dc and Dt.
C
The assumed number of clones.
doParal
Boolean where TRUE means, in Linux, models with different initialization are trained in parallel on a cluster using qsub.
outPrefix
A prefix for the path to save the results.
binomTryNum
The number of models trained using different initialization.
maxIt
The maximum number of EM iterations.
llCutoff
EM iterations stops if the relative improvement in the log-likelihood is not more than this threshold.
jobNamePrefix
If run in parallel, this prefix will be used to name the jobs on the cluster.
qstatWait
The waiting time between qstat commands to assess the number of running and waiting jobs.
fitBinomJobFile
If run in parallel, this is the script which loads data, trains a model using a random initialization, and saves the results.
jobShare
If run in parallel, the job_share option of qsub determines the priority of jobs over other submitted jobs.
ignoredSample
A vector of indices of samples which will be ignored in training. Used by experts only to measure the stability of the results.
fliProb
A "flipping probability" used for noise injection which can be disabled when fliProb=0. After the first EM iteration, each entry of the matrix Mu such as m may change to 1-m with this probability. This probability decreases on subsequent iterations.
conservative
Boolean where TRUE means noise will be injected only if likelihood is improved after an EM iteration, otherwise the original Mu matrix will be used for the next iteration. For expert use only.
doTalk
If TRUE, information on the EM optimization iterations is reported.

Value

Returns a list containing the entry called models, which is a list of the length equal to binomTryNum where each element is a trained model. For each trained model, Mu models the matrix of genotypes, where rows and columns correspond to genomic loci and clones, accordingly. Also, P is the matrix of clonal frequency where rows and columns correspond to clones and samples, accordingly. The first column of P corresponds to the normal clone. The history of Mu, P, and the log-likelihood over iterations is saved in lists Ps, Mus, and Likelihoods, accordingly.

Details

The likelihood of the model, given the hidden variables and the parameters, can be computed based on a combination of binomial distributions. In each EM iteration, the likelihood is increased, however, due to presence of local optima, several models should be tried using different random initialization. For higher number of assumed clones, C, the parameter binomTryNum should be increased because the dimension of the search space grows linearly with C.

References

Inferring clonal composition from multiple sections of a breast cancer, Zare et al., Submitted.

See Also

Clomial, choose.best, Clomial.iterate, compute.bic, breastCancer

Examples

Run this code
set.seed(1)
data(breastCancer)
Dc <- breastCancer$Dc
Dt <- breastCancer$Dt
ClomialResult <-Clomial(Dc=Dc,Dt=Dt,maxIt=20,C=4,binomTryNum=2)
chosen <- choose.best(models=ClomialResult$models)
M1 <- chosen$bestModel
print("Genotypes:")
round(M1$Mu)
print("Clone frequencies:")
M1$P

Run the code above in your browser using DataLab