meltcurve: Melting curve analysis with (iterative) Tm identification and peak area calculation/cutoff

Description

This function conducts a melting curve analysis from the melting curve data of a real-time qPCR instrument. The data has to be preformatted in a way that for each column of temperature values there exists a corresponding fluorescence value column. See edit(dyemelt) for a proper format. The output is a graph displaying the raw fluorescence curve (black), the first derivative curve (red) and the identified melting peaks. The original data together with the results ($-\frac{\partial F}{\partial T}$ values, $T_m$ values) are returned as a list. An automatic optimization procedure is also implemented which iterates over span.smooth and span.peaks values and finds the optimal parameter combination that delivers minimum residual sum-of-squares of the identified $T_m$ values to known $T_m$ values. For all peaks, the areas can be calculated and only those included which have areas higher than a given cutoff (cut.Area). If no peak was identified meeting the cutoff values, the melting curves are flagged with a 'bad' attribute. See 'Details'.

Usage

meltcurve(data, temps = NULL, fluos = NULL, window = NULL, 
          norm = FALSE, span.smooth = 0.05, span.peaks = 51, 
          is.deriv = FALSE, Tm.opt = NULL, Tm.border = c(1, 1), 
          plot = TRUE, peaklines = TRUE, calc.Area = TRUE, 
          plot.Area = TRUE, cut.Area = 0,...)

Arguments

data

a dataframe containing the temperature and fluorescence data.

temps

a vector of column numbers reflecting the temperature values. If NULL, they are assumed to be 1, 3, 5, ... .

fluos

a vector of column numbers reflecting the fluorescence values. If NULL, they are assumed to be 2, 4, 6, ... .

window

a user-defined window for the temperature region to be analyzed. See 'Details'.

norm

logical. If TRUE, the fluorescence values are scaled between [0, 1].

span.smooth

the window span for curve smoothing. Can be tweaked to optimize $T_m$ identification.

span.peaks

the window span for peak identification. Can be tweaked to optimize $T_m$ identification. Must be an odd number.

is.deriv

logical. Use TRUE, if data is already in first derivative transformed format.

Tm.opt

a possible vector of known $T_m$ values to optimize span.smooth and span.peaks against. See 'Details' and 'Examples'.

Tm.border

for peak area calculation, a vector containing left and right border temperature values from the $T_m$ values. Default is -1/+1 ?C.

plot

logical. If TRUE, a plot with the raw melting curve, derivative curve and identified $T_m$ values is displayed for each sample.

peaklines

logical. If TRUE, lines that show the identified peaks are plotted.

calc.Area

logical. If TRUE, all peak areas are calculated.

plot.Area

logical. If TRUE, the baselined area identified for the peaks is plotted by filling the peaks in red.

cut.Area

a peak area value to identify only those peaks with a higher area.

...

other parameters to be passed to plot.

Value

A list with as many items as melting curves, named as in data, each containing a data.frame with the temperature (Temp), fluorescence values (Fluo), first derivative (dF.dT) values, (optimized) parameters of span.smooth/span.peaks, residual sum-of-squares (if Tm.opt != NULL), identified melting points (Tm), calculated peak areas (Area) and peak baseline values (baseline).

Details

The melting curve analysis is conducted with the following steps:

1a) Temperature and fluorescence values are selected in a region according to window. 1b) If norm = TRUE, the fluorescence data is scaled into [0, 1] by qpcR:::rescale. Then, the function qpcR:::TmFind conducts the following steps: 2a) A cubic spline function (splinefun) is fit to the raw fluorescence melt values. 2b) The first derivative values are calculated from the spline function for each of the temperature values. 2c) Friedman's supersmoother (supsmu) is applied to the first derivative values. 2d) Melting peaks ($T_m$) values are identified by qpcR:::peaks. 2e) Raw melt data, first derivative data, best parameters, residual sum-of-squares and identified $T_m$ values are returned. Peak areas are then calculated by qpcR:::peakArea: 3a) A linear regression curve is fit from the leftmost temperature value ($T_m$ - Tm.border[1]) to the rightmost temperature value ($T_m$ + Tm.border[2]) by lm. 3b) A baseline curve is calculated from the regression coefficients by predict.lm. 3c) The baseline data is subtracted from the first derivative melt data (baselining). 3d) A splinefun is fit to the baselined data. 3e) The area of this spline function is integrated from the leftmost to rightmost temperature value. 4) If calculated peak areas were below cut.Area, the corresponding $T_m$ values are removed. Finally, 5) A matrix of xyy-plots is displayed using qpcR:::xyy.plot.

is.deriv must be set to TRUE if the exported data was already transformed to $-\frac{\partial F}{\partial T}$ by the PCR system (i.e. Stratagene MX3000P).

If values are given to Tm.opt (see 'Examples'), then meltcurve is iterated over all combinations of span.smooth = seq(0, 0.2, by = 0.01) and span.peaks = seq(11, 201, by = 10). For each iteration, $T_m$ values are calculated and compared to those given by measuring the residual sum-of-squares between the given values Tm.opt and the $Tm$ values obtained during the iteration: $$RSS = \sum_{i=1}^n{(Tm_i - Tm.opt_i)^2}$$

The returned list items containing the resulting data frame each has an attribute "quality" which is set to "bad" if none of the peaks met the cut.Area criterion (or "good" otherwise).

Examples

Run this code

# NOT RUN {
## Default columns.
data(dyemelt)
res1 <- meltcurve(dyemelt, window = c(75, 86))
res1

## Selected columns and normalized fluo values.
res2 <- meltcurve(dyemelt, temps = c(1, 3), fluos = c(2, 4), 
                  window = c(75, 86), norm = TRUE)  

## Removing peaks based on peak area
## => two peaks have smaller areas and are not included.
res3 <- meltcurve(dyemelt, temps = 1, fluos = 2, window = c(75, 86),  
                  cut.Area = 0.2) 
attr(res3[[1]], "quality")
                 
## If all peak areas do not meet the cutoff value, meltcurve is
## flagged as 'bad'.
res4 <- meltcurve(dyemelt, temps = 1, fluos = 2, window = c(75, 86),  
                  cut.Area = 0.5) 
attr(res4[[1]], "quality")

## Optimizing span and peaks values.
# }
# NOT RUN {
res5 <- meltcurve(dyemelt[, 1:6], window = c(74, 88), 
                  Tm.opt = c(77.2, 80.1, 82.4, 84.8))
# }

Run the code above in your browser using DataLab