segmentByCBS: Segment genomic signals using the CBS method

Description

Segment genomic signals using the CBS method of the DNAcopy package. This is a convenient low-level wrapper for the DNAcopy::segment() method. It is intended to be applied to one sample and one chromosome at the time. For more details on the Circular Binary Segmentation (CBS) method see [1,2].

Usage

## S3 method for class 'default':
segmentByCBS(y, chromosome=0, x=NULL, index=seq(along = y), w=NULL, undo=Inf, ..., joinSegments=TRUE, knownCPs=NULL, columnNamesFlavor=c("PSCBS", "DNAcopy"), seed=NULL, verbose=FALSE)

Arguments

A numeric vector of J genomic signals to be segmented.

chromosome

(Optional) An integer scalar (or a vector of length J contain a unique value). Only used for annotation purposes.

Optional numeric vector of J genomic locations. If NULL, index locations 1:J

Optional numeric vector in [0,1] of J weights.

undo

A non-negative numeric. If less than +Inf, then arguments undo.splits="sdundo" and undo.SD=undo are

...

Additional arguments passed to the segmentation function.

joinSegments

If TRUE, there are no gaps between neighboring segments. If FALSE, the boundaries of a segment are defined by the support

knownCPs

Optional numeric vector of known change point locations.

index

An integer vector of length J specifying the genomewide indices of the loci.

columnNamesFlavor

A character string specifying how column names of the returned data frame should be named.

seed

An (optional) integer specifying the random seed to be set before calling the segmentation method. The random seed is set to its original state when exiting. If

verbose

See Verbose.

Value

Returns the fit object.

Missing and non-finite values

Signals as well as genomic positions may contain missing values, i.e. NAs or NaNs. If so, they are silently excluded before performing the segmentation. None of the input signals may have infinite values, i.e. -Inf or +Inf. If so, an informative error is thrown.

Details

Internally segment is used to segment the signals. This segmentation method support weighted segmentation. The "DNAcopy::segment" implementation of CBS uses approximation through random sampling for some estimates. Because of this, repeated calls using the same signals may result in slightly different results, unless the random seed is set/fixed.

References

[1] A.B. Olshen, E.S. Venkatraman (aka Venkatraman E. Seshan), R. Lucito and M. Wigler, Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 2004. [2] E.S. Venkatraman and A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 2007.

Examples

Run this code

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Simulating copy-number data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
set.seed(0xBEEF)

# Number of loci
J <- 1000

mu <- double(J)
mu[200:300] <- mu[200:300] + 1
mu[650:800] <- mu[650:800] - 1
eps <- rnorm(J, sd=1/2)
y <- mu + eps
x <- sort(runif(length(y), max=length(y))) * 1e5
w <- runif(J)
w[650:800] <- 0.001


xlab <- "Position (Mb)"
ylim <- c(-3,3)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Segment
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x)
print(fit)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)



# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# TESTS
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
print(fit)
##   id chromosome       start      end nbrOfLoci    mean
## 1  y          0    55167.82 20774251       201  0.0164
## 2  y          0 20774250.85 29320105        99  1.0474
## 3  y          0 29320104.86 65874675       349 -0.0227
## 4  y          0 65874675.06 81348129       151 -1.0813
## 5  y          0 81348129.20 99910827       200 -0.0612


# Test #1: Reverse the ordering and segment
fitR <- segmentByCBS(rev(y), x=rev(x), seed=0xBEEF)
# Sanity check
stopifnot(all.equal(fitR$output, fit$output))

# Test #2: Reverse, but preserve ordering of 'data' object
fitRP <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE)
stopifnot(all.equal(fitRP$output, fit$output))


# (Test #3: Change points inbetween data points at the same locus)
x[650:654] <- x[649]
fitC <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE, seed=0xBEEF)

# Test #4: Allow for some missing values in signals
y[450] <- NA
fitD <- segmentByCBS(y, x=x, seed=0xBEEF)


# Test #5: Allow for some missing genomic annotations
x[495] <- NA
fitD <- segmentByCBS(y, x=x, seed=0xBEEF)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# MISC.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Emulate a centromere
x[650:699] <- NA
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)

fitC <- segmentByCBS(y, x=x, joinSegments=FALSE, seed=0xBEEF)
drawLevels(fitC, col="blue", lwd=2, xScale=1e-6)

Run the code above in your browser using DataLab