hsmmRun: Estimating the most likely state sequence using Hidden Semi Markov Model

Description

This is the working horse of the biomvRhsmm

Usage

hsmmRun(x, xid="sampleid", xRange, soj, emis, cMethod='F-B', maxit=1, maxgap=Inf, tol=1e-06, avg.m='median', trim=0, na.rm=TRUE, com.emis=FALSE)

Arguments

input data matrix or vector, ordered wrt. position

xid

name of the sample

xRange

a IRanges/GRanges object, same length as x rows

soj

a list object containing the relevant sojourn distribution parameters

emis

a list object containing the relevant emission distribution parameters

cMethod

C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'

maxit

max iteration of the EM run with Forward-Backward algorithm

maxgap

max distance between neighbouring feature to consider a split

tol

tolerance level of the likelihood change to terminate the EM run

avg.m

method to calculate average value for each segment, 'median' or 'mean' possibly trimmed

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

na.rm

TRUE if NA value should be ignored

com.emis

whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.

Value

yhat: a "Rle" object, the most likely state sequence, same length as x rows number
yp: "Rle", the associated state probability, same length as x rows number
res: Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns

Details

The function fits a Hidden-semi Markov model for the input data matrix / vector, which should contains ordered data from a continuous region on one chromosome. The model will start with flat prior for the initial state probability and transition probability, while emission parameter for each state will be estimated using different quantiles of the input controlled by argument q.alpha and r.var. Argument for the sojourn density should be provided via the list object soj, which is either initialized as flat prior or estimated from other data in a previous call. The positional information in the xRange is used for the optional spiting of physically distant features and construction of returning GRanges object res.

References

Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.

Examples

Run this code

	data(coriell)
	# select only chr1 
	x<-coriell[coriell[,2]==1,]
	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
	xgr<-xgr[order(xgr)]

	J<-2 ; maxk<-50
	# a uniform initial sojourn, not utilizing positional information, just the index
	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
	# run 1 sample only, Coriell.13330
	sample<-colnames(coriell)[5]
	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))

Run the code above in your browser using DataLab