Learn R Programming

Repliscope

Repliscope is an R package for creating, normalising, comparing and plotting DNA replication timing profiles. The analysis pipeline starts with BED-formatted read count files (output of localMapper) obtained by high-throughput sequencing of DNA from replicating and non-replicating cells. There are three methods of measuring DNA replication dynamics using relative copy number (Fig): sort-seq, sync-seq and marker frequency analysis (MFA-seq). Sort-seq uses fluorescence-activated cell sorting (FACS) to enrich for non-replicating and replicating cells from an asynchronous population. Sync-seq requires cells to be arrested in non-replicating cell cycle phase (i.e. G1), followed by release into S phase. Samples are then taken throughout S phase when cells synchronously synthesise DNA according to the replication timing programme. In the case of MFA-seq, rapidly dividing cells in exponential growth phase are directly used as the replicating sample, while a saturated culture serves as a non-replicating control sample. While the latter approach of obtaining cells is the simplest, it also requires deeper sequencing due to decreased dynamic range and, thus, is more suitable for organisms with small genomes (typically, bacteria).

Analysis overview

For best experience, use the Repliscope in interactive mode. To do so, simply run the runGUI() function.

The typical command line analysis using Repliscope starts with loading BED-formatted read count files using the loadBed function. Various functions allow removal of genomic bins containing low quality data (rmChr,rmOutliers). To aid read count analysis, two visualisation functions are used: plotBed and plotCoverage. Next, read-depth adjusted ratio of reads from the replicating sample to reads in the non-replicating sample is calculated using the makeRatio function; the resulting ratio values are distributed around one. The normaliseRatio function is then used to transpose the ratio values to a biologically relevant relative DNA copy number scale (typically, from 1 to 2). The normalised ratio values are, essentially, replication timing profiles, where low values indicate late replication and high values - early replication. The plotRatio function helps to visualise the ratio values as a histogram. Genomic bins containing unusually high or low ratio values may be removed using the trimRatio function. smoothRatio uses cubic spline to smooth replication profile data. compareRatios can be used to calculate difference between two replication profiles using z-score statistics. Finally, replication profiles are plotted using the plotGenome function, which also allows for various genome annotations.

Installation

You can install the released version of Repliscope from CRAN with:

install.packages("Repliscope")

Example

A typical analysis pipeline is below:

repBed <- loadBed('path/to/file1.bed')  # read counts from replicating sample
nrepRep <- loadBed('path/to/file2.bed')  # read counts from non-replicating sample
ratio <- makeRatio(repBed,nrepBed)  # create ratio between replicating and non-replicating samples
ratio <- normaliseRatio(ratio)  # normalise the ratio to fit biological scale of one to two
plotGenome(ratio)  # plot the resulting replicating profile

Output

The replication profiles may further be annotated with additional genomic data, such as location of centromeres, known replication origins or other regions or points of interest. Two replication profiles may be compared to find genomic regions with statistically different replication timing. Resulting plots may be saved as pdf files containing editable vector graphics.

Copy Link

Version

Install

install.packages('Repliscope')

Monthly Downloads

208

Version

1.1.1

License

GPL-3

Maintainer

Dzmitry Batrakou

Last Published

September 13th, 2022

Functions in Repliscope (1.1.1)

compareRatios

A function to compare two replication profiles
W303_S

Sequence read coverage for wild type S.cerevisiae W303 replicating sample
W303

Sequence read coverage ratios for wild type S.cerevisiae W303
makeLabels

A helper function to create axis ticks and human readable labels.
Dbf4myc

Sequence read coverage ratios for S.cerevisiae Dbf4-9myc sample.
makeRatio

A function to calculate 'score' ratio between two bed dataframes makeRatio merges two supplied bed dataframes, calculates ratio of their "score" values normalises the ratio by the 'score' sums.
MFAseq

Replication profile for wild type DS2 H.volcanii
plotRatio

A function to plot a histogram of supplied ratio vector plotRatio plots histogram of values in a supplied vector using ggplot2 and highlights interval between 1 and 2 in green.
plotTrep

A function to scatterplot 'Trep' column of a Trep dataframe plotTrep function plots values in the 'Trep' column of the supplied dataframe as a function of chromosome coordinates. The genome wide median is plotted as a pink line.
runGUI

A function to launch Repliscope in interactive mode (Shiny app).
normaliseRatio

A function to normalise ratio values from 'ratio' column of the provided dataframe to fit biologically-relevant scale. It scales values either using supplied 'rFactor' value or automatically to best fit 1 to 2 scale (the upper limit of the scale may be adjusted with the upperLimit parameter). Normalisation factor used is stored in 'ratioFactor' column and also passed as the dataframe comment. To extract it, use 'attributes(mergedBed)$comment'
loadBed

Load a BED formatted file.
plotBed

A function to boxplot 'score' column of a BED dataframe, per unique chromosome name in the 'chrom' column. The resulting plot also highlights outliers based on the inter quartile range (IQR). The genome wide median is plotted as a pink line through the boxplots.
makeGenome

A helper function to create a gemome dataframe
W303norm

Normalised sequence read coverage ratios for wild type S.cerevisiae W303
guide

Guide dataframe for plotting smoothed sortSeq data
rmChr

A function to remove single chromosome data from a bed dataframe
rmOutliers

A function to remove outliers from the "score" column of a supplied bed dataframe There are three methods: max, IQR and median. Max is used to remove 1 or more maximum values; IQR uses interquartile range to detect outliers, while median method can be used to remove data based on genome-wide median.
sacCer3

S.cerevisiae genome information
plotCoverage

A function to scatterplot 'score' column of a BED dataframe plotCoverage function plots values in the 'score' column of the supplied bed dataframe as a function of chromosome coordinates. The genome wide median is plotted as a pink line.
calcTrep

A function to calculate Trep values from a sync-seq experiment calcTrep function fits a Boltzman sigmoid function into relative copy number datapoints for every genomic bin of the provided sync-seq merged dataframe. It then extracts time at which half of the cells have this genomic bin replicated (Trep). The output of the function is a dataframe containing Trep and TrepErr data for every genomic bin in a BED-like format.
smoothRatio

A function to smooth ratio values using cubic smoothing spline smoothRatio function splits values from 'ratio' column by chromosome and based the supplied 'groupMin' and 'split' parameters and then applies smooth.spline() function from R stats package. The supplied dataframe may contain multiple ratios, i.e. ratios produced using multiple replicating samples and/or multiple non-replicating samples. This must be reflected in 'name.rep' and 'name.nonRep' columns. In other words, different ratio dataframes may be merged using rbind() function before calling smoothRatio() function.
plotGenome

plotGenome: plot replication profile.
sortSeq

Replication profiles for wild type and Dbf4-9myc S.cerevisiae samples
syncSeq

Replication profiles budding yeast arrest-release samples
trimRatio

A function to remove outliers from the "ratio" column of a supplied ratio dataframe trimRatio is applied to the calculated ratio of read counts from a replicating to a non-replicating samples.
W303_G2

Sequence read coverage for wild type S.cerevisiae W303 non-replicating sample.
TrepDF

Trep data calculated from syncSeq[["data"]]