cn_means: A parameter class for computing Emission probabilities

Description

Parameters for computing emission probabilities for a 6-state HMM, including starting values for the mean and standard deviations for log R ratios (assumed to be Gaussian) and B allele frequencies (truncated Gaussian), and initial state probabilities.

Constructor for EmissionParam class

This function is exported primarily for internal use by other BioC packages.

Usage

cn_means(object)
cn_sds(object)
baf_means(object)
baf_sds(object)
baf_means(object) <- value
baf_sds(object) <- value
cn_sds(object) <- value
cn_means(object) <- value
EmissionParam(cn_means = CN_MEANS(), cn_sds = CN_SDS(),
  baf_means = BAF_MEANS(), baf_sds = BAF_SDS(), initial = rep(1/6, 6),
  EMupdates = 5L, CN_range = c(-5, 3), temper = 1, p_outlier = 1/100,
  modelHomozygousRegions = FALSE)
EMupdates(object)
## S3 method for class 'EmissionParam':
show(object)

Arguments

object

see showMethods("EMupdates")

value

numeric vector

cn_means

numeric vector of starting values for log R ratio means (order is by copy number state)

cn_sds

numeric vector of starting values for log R ratio standard deviations (order is by copy number state)

baf_means

numeric vector of starting values for BAF means ordered. See example for details on how these are ordered.

baf_sds

numeric vector of starting values for BAF means ordered. See example for details on how these are ordered.

initial

numeric vector of intial state probabilities

EMupdates

number of EM updates

CN_range

the allowable range of log R ratios. Log R ratios outside this range are thresholded.

temper

Emission probabilities can be tempered by emit^temper. This is highly experimental.

p_outlier

probability that an observation is an outlier (assumed to be the same for all markers)

modelHomozygousRegions

logical. If FALSE (default), the emission probabilities for BAFs are modeled from a mixture of truncated normals and a Unif(0,1) where the mixture probabilities are given by the probability that the SNP is heterozygous. See Details below for a discussion of the implications.

Value

numeric vector

Details

When modelHomozygousRegions is FALSE (the default in versions >= 1.28.0), emission probabilities for B allele frequences are calculated from a mixture of a truncated normal densities and a Unif(0,1) density with the mixture probabilities given by the probability that a SNP is homozygous. In particular, let p denote a 6 dimensional vector of density estimates from a truncated normal distribution for the latent genotypes 'A', 'B', 'AB', 'AAB', 'ABB', 'AAAB', and 'ABBB'. The probability that a genotype is homozygous is estimated as

$$prHom=(p["A"] + p["B"])/sum(p)$$

and the probability that the genotype is heterozygous (any latent genotype that is not 'A' or 'B') is given by

$$prHet = 1-prHom$$

Since the density of a Unif(0,1) is 1, the 6-dimensional vector of emission probability at a SNP is given by

$$emit = prHet * p + (1-prHet)$$

The above has the effect of minimizing the influence of BAFs near 0 and 1 on the state path estimated by the Viterbi algorithm. In particular, the emission probability at homozygous SNPs will be virtually the same for states 3 and 4, but at heterozygous SNPs the emission probability for state 3 will be an order of magnitude greater for state 3 (diploid) compared to state 4 (diploid region of homozygosity). The advantage of this parameterization are fewer false positive hemizygous deletion calls. [ Log R ratios tend to be more sensitive to technical sources of variation than the corresponding BAFs/ genotypes. Regions in which the log R ratios are low due to technical sources of variation will be less likely to be interpreted as evidence of copy number loss if heterozygous genotypes have more 'weight' in the emission estimates than homozgous genotypes. ] The trade-off is that only states estimated by the HMM are those with copy number alterations. In particular, copy-neutral regions of homozygosity will not be called.

By setting modelHomozygousRegions = TRUE, the emission probabilities at a SNP are given simply by the p vector described above and copy-neutral regions of homozygosity will be called.#'

Details

The log R ratios are assumed to be emitted from a normal distribution with a mean and standard deviation that depend on the latent copy number. Similarly, the BAFs are assumed to be emitted from a truncated normal distribution with a mean and standard deviation that depends on the latent number of B alleles relative to the total number of alleles (A+B).

Examples

Run this code

ep <- EmissionParam()
cn_means(ep)
ep <- EmissionParam()
cn_sds(ep)
ep <- EmissionParam()
baf_means(ep)
ep <- EmissionParam()
baf_sds(ep)
ep <- EmissionParam()
baf_means(ep) <- baf_means(ep)
ep <- EmissionParam()
baf_sds(ep) <- baf_sds(ep)
ep <- EmissionParam()
cn_sds(ep) <- cn_sds(ep)
ep <- EmissionParam()
cn_means(ep) <- cn_means(ep)
ep <- EmissionParam()
show(ep)
cn_means(ep)
cn_sds(ep)
baf_means(ep)
baf_sds(ep)

Run the code above in your browser using DataLab