Learn R Programming

biomvRCNS (version 1.12.0)

hsmmRun: Estimating the most likely state sequence using Hidden Semi Markov Model

Description

This is the working horse of the biomvRhsmm

Usage

hsmmRun(x, xid="sampleid", xRange, soj, emis, cMethod='F-B', maxit=1, maxgap=Inf, tol=1e-06, avg.m='median', trim=0, na.rm=TRUE, com.emis=FALSE)

Arguments

x
input data matrix or vector, ordered wrt. position
xid
name of the sample
xRange
a IRanges/GRanges object, same length as x rows
soj
a list object containing the relevant sojourn distribution parameters
emis
a list object containing the relevant emission distribution parameters
cMethod
C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'
maxit
max iteration of the EM run with Forward-Backward algorithm
maxgap
max distance between neighbouring feature to consider a split
tol
tolerance level of the likelihood change to terminate the EM run
avg.m
method to calculate average value for each segment, 'median' or 'mean' possibly trimmed
trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm
TRUE if NA value should be ignored
com.emis
whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.

Value

a list object,
yhat
a "Rle" object, the most likely state sequence, same length as x rows number
yp
"Rle", the associated state probability, same length as x rows number
res
Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns

Details

The function fits a Hidden-semi Markov model for the input data matrix / vector, which should contains ordered data from a continuous region on one chromosome. The model will start with flat prior for the initial state probability and transition probability, while emission parameter for each state will be estimated using different quantiles of the input controlled by argument q.alpha and r.var. Argument for the sojourn density should be provided via the list object soj, which is either initialized as flat prior or estimated from other data in a previous call. The positional information in the xRange is used for the optional spiting of physically distant features and construction of returning GRanges object res.

References

Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.

See Also

biomvRhsmm

Examples

Run this code
	data(coriell)
	# select only chr1 
	x<-coriell[coriell[,2]==1,]
	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
	xgr<-xgr[order(xgr)]

	J<-2 ; maxk<-50
	# a uniform initial sojourn, not utilizing positional information, just the index
	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
	# run 1 sample only, Coriell.13330
	sample<-colnames(coriell)[5]
	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))

Run the code above in your browser using DataLab