Learn R Programming

biomvRCNS (version 1.12.0)

biomvRseg: Homogeneous segmentation of multi-sample genomic data

Description

The function will perform a two stage segmentation on multi-sample genomic data from array experiment or high throughput sequencing data.

Usage

biomvRseg(x, maxk=NULL, maxbp=NULL, maxseg=NULL, xPos=NULL, xRange=NULL, usePos='start', family='norm', penalty='BIC', twoStep=TRUE, segDisp=FALSE, useMC=FALSE, useSum=TRUE, comVar=TRUE, maxgap=Inf, tol=1e-06, grp=NULL, cluster.m=NULL, avg.m='median', trim=0, na.rm=TRUE)

Arguments

x
input data matrix, or a GRanges object with input stored in the meta DataFrame
maxk
maximum length of a segment
maxbp
maximum length of a segment in bp, given positional information specified in xPos / xRange / or x
maxseg
maximum number of segment the function will try
xPos
a vector of positions for each x row
xRange
a IRanges/GRanges object, same length as x rows
usePos
character value to indicate whether the 'start', 'end' or 'mid' point position should be used
family
family of x distribution, only the following types are supported: 'norm', 'nbinom', 'pois'
penalty
penalty method used for determining the optimal number of segment using likelihood, possible values are 'none','AIC','AICc','BIC','SIC','HQIC', 'mBIC'
twoStep
TRUE if a second stage merging will be performed after the initial group segmentation
segDisp
TRUE if a segment-wise estimation of dispersion parameter rather than using a overall estimation
useMC
TRUE if mclapply should be used to speed up the calculation for nbinom dispersion estimation
useSum
TRUE if using grand sum across sample / x columns, like in the tilingArray solution
comVar
TRUE if assuming common variance across samples (x columns)
maxgap
max distance between neighbouring feature to consider a split
tol
tolerance level of the likelihood change to determining the termination of the EM run
grp
vector of group assignment for each sample, with a length the same as columns in the data matrix, samples within each group would be processed simultaneously if a multivariate emission distribution is available
cluster.m
clustering method for prior grouping, possible values are 'ward','single','complete','average','mcquitty','median','centroid'
avg.m
method to calculate average value for each segment, 'median' or 'mean' possibly trimmed
trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm
TRUE if NA value should be ignored

Value

A biomvRCNS-class object:
x:
Object of class "GRanges", with range information either from real positional data or just indices, with input data matrix stored in the meta columns.
res:
Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE' and segment mean slot 'MEAN' stored in the meta columns
param:
Object of class "list", list of all parameters used in the model run.

Details

A homogeneous segmentation algorithm, using dynamic programming like in tilingArray; however capable of handling count data from sequencing.

References

Piegorsch, W. W. (1990). Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics, 863-867.

Picard,F. et al. (2005) A statistical approach for array CGH data analysis. BMC Bioinformatics, 6, 27. Huber,W. et al. (2006) Transcript mapping with high density oligonucleotide tiling arrays. Bioinformatics, 22, 1963-1970. .

Zhang, N. R. and Siegmund, D. O. (2007). A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data. Biometrics 63 22-32.

Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9, 321-332

See Also

biomvRhsmm

Examples

Run this code
	data(coriell)
	xgr<-GRanges(seqnames=paste('chr', coriell[,2], sep=''), IRanges(start=coriell[,3], width=1, names=coriell[,1]))
	values(xgr)<-DataFrame(coriell[,4:5], row.names=NULL)
	xgr<-xgr[order(xgr)]
	resseg<-biomvRseg(x=xgr, maxbp=4E4, maxseg=10, family='norm', grp=c(1,2))

Run the code above in your browser using DataLab