RJaCGH: Reversible Jump MCMC for the analysis of arrays of CGH

Description

This function fits a non-homogeneous hidden Markov model to CGH data through bayesian methods and Reversible Jump Markov chain Montecarlo.

Usage

RJaCGH(y, Chrom = NULL, Pos = NULL, Dist=NULL, model = "genome",
max.dist=NULL, normal.reference=0, normal.ref.percentile=0.95,
burnin = 10000, TOT =10000, k.max = 6, stat = NULL, mu.alfa = NULL,
mu.beta = NULL, ka = NULL,
g = NULL, prob.k = NULL, jump.parameters=list(), start.k = NULL,
RJ=TRUE, auto.label=0.60)

Arguments

Vector with Log Ratio observations.

Chrom

Vector with Chromosome indicator. Must be numeric (1 to 23)

Pos

Vector with Positions of every gene. They can be absolute to the genome or relative to the chromosome. They should be ordered within every chromosome. This is, the arrays must be ordered by their positions in the genome. They must be integers.

Dist

Optional vector of distances between genes. It should be a vector of length length(y)-1. Note that when Chrom is not NULL, every last value of every Chromosome is not used.

model

if model="genome", the same model is fitted for the whole genome. If model="Chrom", a different model is fitted for each chromosome.

max.dist

maximal distance between spots. When two spots have a distance between them as far or further than max.dist, they are considered independent. That is, the state of that spot does not affect the state of the other. If NULL

normal.reference

The value considered as the mean of the normal state. See details. By default is 0.

normal.ref.percentile

Percentage for the relabelling of states. See details. by default is 0.95.

burnin

Number of burn-in iterations in the Markov Chain

TOT

Number of iterations after the burn-in

k.max

Maximum number of hidden states to fit.

stat

Initial Distribution for the hidden states. Must be a vector of size 1 + 2 + ... +k.max. If NULL, it is assumed a uniform distribution for every model.

mu.alfa

Hyperparameter. See details

mu.beta

Hyperparameter. See details

prob.k

Hyperparameter. See details

jump.parameters

List with the parameters for the MCMC jumps. See details.

start.k

Initial number of states. if NULL, a random draw from prob.k is chosen.

Logical. If TRUE, Reversible Jump is performed. If not, MCMC over a fixed number of hidden states.

auto.label

If not NULL, should be the minimum proportion of observations labeled as 'Normal'. See details.

Value

The object returned follows a hierarchy: If y is a matrix or data.frame (i.e., several arrays), an object of class RJaCGH.array is returned, with components:
[[]]A list with an object of corresponding class (see below) for every array.
array.namesVector with the names of the arrays.
If model is "genome", an object of class RJaCGH.genome is returned, with components:
[[]]a list with as many objects as k.max, with the fits.
ksequence of number of hidden states sampled.
prob.bNumber of birth moves performed (Includes burn-in.
prob.dNumber of death moves performed (Includes burn-in.
prob.sNumber of split moves performed (Includes burn-in.
prob.cNumber of combine moves performed (Includes burn-in.
yy vector.
PosPos vector.
modelmodel.
ChromChromosome vector.
xx vector of distances between genes.
If model is "Chrom", an object of class RJaCGH.Chrom is returned, with the following components:
[[]]a list with as many components as chromosomes, of class RJaCGH (See below).
PosPos vector.
modelmodel.
ChromChromosome vector.
If no model was specified and no Chrom was given, an object of class RJaCGH is returned, with components k, prob.b, prob.d, prob.s, prob.c, y, Pos, x as described before, plus a list with as many components of number of max hidden states fitted. For every hidden markov model fitted, a list is returned with components:
mua matrix with the means sampled
sigma.2a matrix with the variances sampled
betaan array of dimension 3 with beta values sampled
statvector of initial distribution
logliklog likelihoods of every MCMC iteration
prob.muprobability of aceptance of mu in the Metropolis-Hastings step.
prob.sigma.2probability of aceptance of sigma.2 in the Metropolis-Hastings step.
prob.betaprobability of aceptance of beta in the Metropolis-Hastings step.
state.labelsLabels of the biological states.
prob.statesMarginal posterior probabilities of belonging to every hidden state.

Details

RJaCGH fits the following bayesian model: There is a priori distribution for the number of hidden states (different copy numbers) as stated by prob.k. If NULL, a uniform distribution between 1 and k.max is used.

The hidden states follow a normal distribution which mean (mu) follows itself a normal distribution with mean mu.alfa and stdev mu.beta. If NULL, these are the median of the data and the range. The variance (sigma.2)of the hidden states follows an Inverse Gamma distribution with parameters ka and g. If NULL, these are 2 and diff(range(y))^2 / 50, respectively.

The model for the transition matrix is based on a random matrix beta whose diagonal is zero. The transition matrix, Q, has the form: Q[i,j] = exp(-beta[i,j] + beta[i,j]*x) / sum(i,.) {exp(-beta[i,.] + beta[i,.]*x}

The prior distribution for beta is gamma with parameters 1, 1. The x are the distances between positions, normalized to lay between zero and 1 (x=diff(Pos) / max(diff(Pos)))

RJaCGH performs Markov Chain MonteCarlo with Reversible Jump to sample for the posterior distribution. Every sweep has 3 steps:

1.- A Metropolis-Hastings move is used to update, for a fixed number of hidden states, mu, sigma.2 and beta. A symmetric proposal with a normal distribution and standard deviation sigma.tau.mu, sigma.tau.sigma.2 and sigma.tau.beta is sampled.

2.- A transdimensional move is chosen, between birth (a new hidden state is sampled from the prior) or death (an existing hidden state is erased).

3.- Another transdimensional move is performed; an split move (divide an existing state in two) or a combine move (join two adjacent states). The length of the split is sampled from a normal distribution with standard deviation tau.split.mu for the mu and tau.split.beta for beta.

jump.parameters must be a list with the parameters for the moves. It must have components sigma.tau.mu, sigma.tau.sigma.2, sigma.tau.beta These are vectors of length k.max. tau.split.mu, tau.split.beta are vectors of length 1. If any of them is NULL, a call to the internal function get.jump() is made to find 'good' values.

A relabelling of hidden states is performed to match biological states. The states whom have the normal.reference value inside a normal.ref.percentile% confidence based on a normal distribution with means the median of mu and sd the square root of the median of sigma.2 are labelled as 'Normal'. If no state is labelled as 'Normal', the closest state to normal.reference is chosen. If auto.label is not null, closest states to 'Normal' are also labelled as 'Normal' until a proportion of auto.label is reached. Please note that the default value is 0.60, so at least the 60% of the observations will be labelled as 'Normal'.

References

Oscar M. Rueda and Ramon Diaz Uriarte. A flexible, accurate and extensible statistical method for detecting genomic copy-number changes. http://biostats.bepress.com/cobra/ps/art9/. {http://biostats.bepress.com/cobra/ps/art9/}. Cappe, Moulines and Ryden, 2005. Inference in Hidden Markov Models. Springer. Green, P.J. (1995) Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.

Examples

Run this code

y <- c(rnorm(100, 0, 1), rnorm(10, -3, 1), rnorm(20, 3, 1),
       rnorm(100,0, 1)) 
Pos <- sample(x=1:500, size=230, replace=TRUE)
Pos <- cumsum(Pos)
Chrom <- rep(1:23, rep(10, 23))

jp <- list(sigma.tau.mu=rep(0.05, 4), sigma.tau.sigma.2=rep(0.03, 4),
           sigma.tau.beta=rep(0.07, 4), tau.split.mu=0.1, tau.split.beta=0.1)

fit.chrom <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Chrom",
                    burnin=10, TOT=1000, k.max = 4,
                    jump.parameters=jp)
##RJ results for chromosome 5
table(fit.chrom[[5]]$k)
fit.genome <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="genome",
burnin=100, TOT=1000, jump.parameters=jp, k.max = 4)
## Results for the model with 3 states:
apply(fit.genome[[3]]$mu, 2, summary)
apply(fit.genome[[3]]$sigma.2, 2, summary)
apply(fit.genome[[3]]$beta, c(1,2), summary)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

References

See Also

Examples