Learn R Programming

R package dpseg: piecewise linear segmentation by a simple dynamic programing algorithm

authors: "Rainer Machne, Peter F. Stadler"

This package performs piecewise linear segmentation of ordered data by a dynamic programing algorithm, provided via the function dpseg. It was developed for time series data, eg. growth curves, and for genome-wide read-count data from next generation sequencing.

The package and its documentation are also intended to serve as a tutorial on dynamic programing and the segmentation problem. A movie function visualizes the progress of the algorithm through the data.

Moreover, the package features generic implementations of dynamic programing routines, where new segmentation criteria ("scoring functions") can be tested in base R and efficient versions implemented in Rcpp.

Documentation

See the package vignette (vignette("dpseg")) for details.

Installation

Install package from within R via cran:

install.packages("dpseg")

Developtment Version

library(devtools)
install_gitlab("raim/dpseg")

Basic Usage

library(dpseg)

# get example data `oddata` - bacterial growth measured as optical density OD
x <- oddata$Time
y <- log(oddata[,"A2"]) # note: exponential growth -> log(y) is linear

segs <- dpseg(x=x, y=y, jumps=FALSE, P=0.0004)

## inspect resulting segments
print(segs)

## plot results
plot(segs)

## use predict method
lines(predict(segs),lty=2, lwd=3, col="yellow")

## view the algorithm in action
movie(segs)

Theory

Piecewise Linear Segmentation

The problem is to find break-points in 2-dimensional data, eg. timeseries, that split the data into linear segments. This can be formulated as an optimization problem that can be solved by dynamic programing:

\newcommand{\Ell}{\mathcal{L}}
\newcommand{\jump}{\mathcal{J}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\Cov}{\mathrm{Cov}}
\newcommand{\lmax}{\ell_\text{max}}
\newcommand{\lmin}{\ell_\text{min}}
S_j = \max_{i\le j} (S_{i-\jump} + \text{score}(i,j)) - P\;,

where the $\text{score}$ quantifies how well a segment between points $i$ and $j$ is defined, eg. some goodness-of-fit measure such as the negative variance of the residuals

\text{score}(i,j) = -s_r^2

of a straight line fitted through data points from points $i$ to $j$. $P$ is a penalty term, and $P>0$ allows to fine-tune segment lengths. At constant scores it will accumulate in $S_j$ (subtracted for each $i$), forcing the algorithm to "wait" until a higher score is reached. Similarly, initialization of $S_0>0$ to a relatively large value avoids spurious segments of length 1 at $j=1$ by enforcing a break-point before $i=1$. $S_1$ has to be initialized to $S_1=-P$.

Discontinuous jumps between adjacent segments can be allowed with $\newcommand{\jump}{\mathcal{J}} \jump =1$, while segment borders (break-points) are part of both left and right segments with $\newcommand{\jump}{\mathcal{J}} \jump =0$, (in which case $S_0$ has no effect).

See the package vignette (vignette("dpseg")) for details.

Copy Link

Version

Install

install.packages('dpseg')

Monthly Downloads

219

Version

0.1.1

License

GPL (>= 2)

Maintainer

Rainer Machne

Last Published

August 17th, 2020

Functions in dpseg (0.1.1)

predict.dpseg

Predict method for `dpseg` segmentations
print.dpseg

scanP

Scan over different penalty P values
sgtable

construct a segment table
dpseg

dpseg : linear segmentation by dynamic programming
dpseg_old

inefficient dpseg implementation
backtrace_r

backtracing dpseg segment break-points
addLm

Adds linear regression data to dpseg results or a table of segment borders.
estimateP

Estimate a starting value for penalty P.
oddata

Escherichia coli growth curves.
movie

Visualizes the dpseg segmentation recursion as a movie.
plot.dpseg

Plot method for a dpseg segmentation model.