Learn R Programming

SomatiCA (version 2.2.0)

larsCBSsegment: Segmentation based on Circular Binary Segmentation followed by a model selection procedure on detected change points.

Description

A model selection procedure is applied after CBS segmentation. In another word, we assess which ones in over-detected change points from CBS calls are really necessary. More specifically, we used $K$ change points as $K$ predictors for input $X_i, i = (0,..., n)$ to fit a linear model and select variables by step-wise regression implemented in lars()(from R package lars). Then optimal change points could be selected from the LARS solution path via different criterions.

Usage

larsCBSsegment(data, selection = .selection.default(), collapse.k = 0, ncores = 1, verbose = TRUE, variation.control = TRUE, rss = FALSE, S = 0.1, k = 50, ...)

Arguments

data
A GRanges object, output of SomatiCAFormat().
selection
Model selection parameters.
collapse.k
Number of data points collapsed.
ncores
Number of cores used.
verbose
Whether working messages are shown.
variation.control
A logical value, whether pseudo points are used to smooth the segment. Default is TRUE.
rss
A logical value, whether a cutoff based on residue sum of squares is used. Default is FALSE.
S
The cutoff based on residue sum of squares. Default is 0.1.
k
The window size used to smooth the outliers.
...
Arguments for segment() in DNAcopy package.

Value

segment
S4 class, "Segmented".
hetsites
Heterozygous sites used in segmentation, unsmoothed.

References

Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics. Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572. Venkatraman, E. S., Olshen, A. B. (2007) A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.

See Also

See Also SomatiCAFormat, lars, segment.

Examples

Run this code
rawLAF <- c(rnorm(300, 0.2, 0.05), rnorm(300, 0.4, 0.05), rnorm(200, 0.3, 0.05), rnorm(200, 0.2, 0.05), rnorm(200, 0.3, 0.05), rnorm(250, 0.4, 0.05)) 
rawLAF <- ifelse(rawLAF>0.5, 1-rawLAF, rawLAF) 
germLAF <- c(rnorm(800+650, 0.4, 0.05)) 
germLAF <- ifelse(germLAF>0.5, 1-germLAF, germLAF) 
reads1 <- c(rpois(300, 25), rpois(300, 50), rpois(200, 60),  rpois(200, 25), rpois(200, 40), rpois(250, 50))
reads2 <- rpois(800+650,50)
chr <- c(rep("chr1", 800), rep("chr2", 650))
position <- c(c(1:800), c(1:650))
zygo <- rep("het", 800+650)
x <- data.frame(chr, as.integer(position), as.character(zygo), as.integer(reads1), rawLAF, as.integer(reads2), germLAF) 
colnames(x) <- c("seqnames", "start", "zygosity", "tCount", "LAF", "tCountN", "germLAF")            
data <- SomatiCAFormat(x)

### This is an easy example, without much noise.
### Consider to use rss=T to select change points from sequencing data
seg <- larsCBSsegment(data, rss = FALSE)
 
plotSegment(seg$segment, data, k = 1, smooth = FALSE)
plotSegment(seg$segment, data, k = 2, smooth = FALSE)

Run the code above in your browser using DataLab