refmean: Averaging of multiple reference genes

Description

This function averages the expression of several reference genes before calculation of gene expression ratios by ratiocalc or ratiobatch. The method is similar to that described in Vandesompele et al. (2002), but uses arithmetic averaging of threshold cycles/efficiencies and not geometric averaging of relative expression values. This is equivalent, as discussed in 'Details' and as shown in 'Examples'. An essential extension of this method is, that if replicates for the reference genes are supplied, the threshold cycles and efficiencies are subjected to error propagation prior to ratio calculation. The propagated error is then included in the calculation of the gene expression ratios, as advocated in Nordgard et al. (2006).

Usage

refmean(data, group, which.eff = c("sig", "sli", "exp", "mak", "ext"),
        type.eff = c("individual", "mean.single"), 
        which.cp = c("cpD2", "cpD1", "cpE", "cpR", "cpT", "Cy0", "ext"),
        verbose = TRUE, ...)

Arguments

data

multiple qPCR data generated by pcrbatch.

group

a character vector defining the replicates (if any) of control/treatment samples and reference genes/genes-of-interest. See 'Details'

which.eff

efficiency calculated by which method. Defaults to sigmoidal fit. Can also be a value such as 1.8, as shown in 'Examples'. See ratiocalc.

type.eff

using individual or averaged efficiencies for the replicates of a reference gene. See 'Details'.

which.cp

type of threshold cycles to be used for the analysis. Defaults to cpD2. See ratiocalc.

verbose

logical. If TRUE, the steps of analysis are shown in the console window.

...

parameters to be supplied to propagate.

Value

The same dataset data which was supplied to the function, but with modified threshold cycle/efficiency values in which the values are created per sample in a way, that they have the mean of all averaged reference genes and the same standard deviation as obtained by error propagation. See 'Details' for a more thorough explanation. Furthermore, a modified label vector "NAME_mod" is written to the global environment (if "NAME" was supplied for group) in which the different reference gene labels are aggregated, i.e. c("r1c1", "r2c1", "r3c1") will become c("r1c1", "r1c1", "r1c1"). This new label vector is also attached as an attribute to the output data and can be obtained by attr(RES1, "group").

Details

As in ratiobatch, the samples are to be defined as a character vector in the style of "g1s1", "g1c1", "r1s1" and "r1c1" etc. If refmean is used as a standalone function or switched on in ratiobatch using refmean = TRUE, different reference genes per control/treatment samples are averaged when supplied either as single runs or as replicates.

Examples (omitting genes-of-interest in control/treatment samples): 2 reference genes, 2 replicates each: c("r1s1", "r1s1, "r2s1", "r2s1", "r1c1", "r1c1, "r2c1", "r2c1", ...). 3 reference genes, no replicates: c("r1s1", "r2s1, "r3s1", "r1c1", "r2c1, "r3c1", ...)

Averaging of multiple reference genes is accomplished the following way: Given $i$ reference genes with $j$ replicates in a sample, all replicates $r_{ij}$ are used for calculating mean $\mu_{r_i}$ and standard deviation $\sigma_{r_i}$ of the threshold cycles and efficiencies. The overall (grand) mean $\mu_r$ and propagated error $\sigma_r$ is calculated using propagate with first-order Taylor expansion including covariance: $\sigma_r = F_{r_i}C_{r_i}F_{r_i}^T$. Finally, a vector of length $L = n(r_{ij})$ containing equidistant numbers $X = (x_1, x_2, x_3, \ldots x_L)$ with mean $\mu_r$ and standard deviation $\sigma_r$ is generated for a new overall reference gene $r_1$. This is done using the internal function qpcR:::makeStat which calculates a shifted ($\mu_r$) and scaled ($\sigma_r$) Z-transformation on a vector $x_1 \ldots x_L$: $$Z_i = \mu_r + \frac{(x_i - \bar{X})}{\sigma_X} \cdot \sigma_r$$ The new $Z_i$ threshold cycle and efficiency values replace all values of $r_{ij}$ in data. When using ratiobatch, this modified data is then used for the ratio calculation, again using propagate to calculate errors for ratios using the $Z_i$ values as mentioned above. By using logarithmic identities (http://en.wikipedia.org/wiki/Logarithmic_identities), it can be shown that the geometric mean can be transformed to the arithmetic mean by logarithmation (assuming constant $E$): $$\left( \prod_{i=1}^n E^{x_i} \right)^{\frac{1}{n}} = \frac{1}{n} \cdot \log_E \left( \prod_{i=1}^n E^{x_i} \right) = \frac{1}{n} \sum_{i=1}^n x_i$$ Hence, arithmetic averaging of the threshold cycles BEFORE ratio calculation is the same as doing geometric averaging on relative quantities AFTER ratio calculation. This is demonstrated in 'Examples' and also mentioned in the geNorm manual (http://www.gene-quantification.com/geNorm_manual.pdf) in Q8 (page 12).

When setting type.eff = "individual" (default), all efficiencies from replicates of a reference gene in a control/treatment sample $E(r_{ij})$ are used for calculating mean $\mu_{E(r_i)}$ and standard deviation $\sigma_{E(r_i)}$, the latter being used for calculating a propagated error for all reference gene efficiencies $\sigma_{E(r)}$. If type.eff = "mean.single", all $E(r_{ij})$ values from the replicates are set to the same value $\mu_{E(r_i)}$, that is, there is no variation assumed between the different $E(r_{ij})$. In this case, $\sigma_{E(r_i)} = 0$, so that no error of the replicates is propagated to $\sigma_{E(r)}$. This results in smaller overall errors of the output, but it can be debated if this is a realistic approach, hence both settings were implemented.

which.eff can be supplied with an efficiency value such as 1.8, which is then used as the efficiency for all reference runs $E(r_{ij})$.

References

Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F. Genome Biol (2002), 3: research0034-research0034.11.

Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: the balance between accuracy and precision. Nordgard O, Kvaloy JT, Farmen RK, Heikkil? R. Anal Biochem (2006), 356: 182-193.

Examples

Run this code

# NOT RUN {
## Replacing the reference gene values by
## averaged ones in the original data.
## => RES1 is new dataset.
## => GROUP1_mod in global environment is
## new labeling vector.
DAT1 <- pcrbatch(reps, fluo = 2:19, model = l5)
GROUP1 <- c("r1c1", "r1c1", "r2c1", "r2c1", "g1c1", "g1c1",
           "r1s1", "r1s1", "r1s2", "r1s2", "r2s1", "r2s1",
           "r2s2", "r2s2", "g1s1", "g1s1", "g1s2", "g1s2") 
RES1 <- refmean(DAT1, GROUP1, which.eff = "sig", which.cp = "cpD2")

## Using three reference genes without replicates
## and then 'ratiobatch'.
## This can also be called in 'ratiobatch' directly
## with parameter 'refmean = TRUE'. See there.
## In this example, already averaged dataset and 
## new labeling vector are supplied to 'ratiobatch', 
## so one has to set 'refmean = FALSE'.
DAT2 <- pcrbatch(reps, fluo = 2:9, model = l5)
GROUP2 <- c("r1c1", "r2c1", "r3c1", "g1c1", "r1s1", "r2s1", "r3s1", "g1s1" ) 
RES2 <- refmean(DAT2, GROUP2, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES2, GROUP2_mod, refmean = FALSE)

## Comparison between 'refmean' ct-value arithmetic averaging
## and 'geNorm' relative quantities geometric averaging
## using data from the geNorm manual (2008), page 6.
## We will use HK1-HK3 as in the manual (no replicates).
## First we create a 'pcrbatch' dataset and then 
## override the ct values with those of the manual and all
## efficiencies with E = 2. Sample A is considered as control sample.
DAT3 <- pcrbatch(reps, fluo = 2:17, model = l5)
DAT3[8, -1] <- c(32.10, 27.00, 34.90, 23.00,
                 33.30, 28.40, 36.10, 24.20,
                 31.00, 27.50, 34.00, 26.35,
                 30.50, 28.20, 33.00, 25.45)
DAT3[1, -1] <- 2
GROUP3 <- c("r1c1", "r2c1", "r3c1", "g1c1",
            "r1s1", "r2s1", "r3s1", "g1s1",
            "r1s2", "r2s2", "r3s2", "g1s2",
            "r1s3", "r2s3", "r3s3", "g1s3")
RES3 <- refmean(DAT3, GROUP3, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES3, GROUP3_mod, which.cp = "cpD2",
           which.eff = "sig", refmean = FALSE)
## Results:
## r1c1:g1c1:r1s1:g1s1  refmean 1.0497
##                      geNorm 1.0472 (2.351/2.245)
## r1c1:g1c1:r1s2:g1s2  refmean 0.0693
##                      geNorm 0.0695 (0.156/2.245)
## r1c1:g1c1:r1s3:g1s3  refmean 0.1081
##                      geNorm 0.1074 (0.241/2.245)
## Slight differences are due to rounding.
# }

Run the code above in your browser using DataLab