summary.subsamples: calculate summary statistics for each subsampled depth in a subsamples object

Description

Given a subsamples object, calculate a metric for each depth that summarizes the power, the specificity, and the accuracy of the effect size estimates at that depth.

Usage

## S3 method for class 'subsamples':
summary(object, oracle = NULL, FDR.level = 0.05,
  average = FALSE, p.adjust.method = "qvalue", ...)

Arguments

object

a subsamples object

oracle

a subsamples object of one depth showing what each depth should be compared to; if NULL, each will be compared to the highest depth

FDR.level

A false discovery rate used to calculate the number of genes found significant at each level

average

If TRUE, averages over replications at each method+depth combination before returning

p.adjust.method

Method to correct p-values in order to determine significance. By default "qvalue", but can also be given any method that can be given to p.adjust.

...

further arguments passed to or from other methods.

Value

A summary object, which is a data.table with one row for each subsampling depth, containing the metrics
significantnumber of genes found significant at the given FDR
pearsonPearson correlation of the coefficient estimates with the oracle
spearmanSpearman correlation of the coefficient estimates with the oracle
concordanceConcordance correlation of the coefficient estimates with the oracle
MSEmean squared error between the coefficient estimates and the oracle
estFDPestimated FDP: the estimated false discovery proportion, as calculated from the average oracle local FDR within genes found significant at this depth
rFDPrelative FDP: the proportion of genes found significant at this depth that were not found significant in the oracle
percentthe percentage of genes found significant in the oracle that were found significant at this depth

Details

To perform these calculations, one must compare each depth to an "oracle" depth, which, if not given explicitly, is assumed to be the highest subsampling depth. This thus summarizes how closely each agrees with the full experiment: if very low-depth subsamples still agree, it means that the depth is high enough that the depth does not make a strong qualitative difference.

The concordance correlation coefficient is described in Lin 1989. Its advantage over the Pearson is that it takes into account not only whether the coefficients compared to the oracle close to a straight line, but whether that line is close to the x = y line.

Note that selecting average=TRUE averages the depths of the replicates (as two subsamplings with identical proportions may have different depths by chance). This may lead to depths that are not integers.

References

Lawrence I-Kuei Lin (March 1989). "A concordance correlation coefficient to evaluate reproducibility". Biometrics (International Biometric Society) 45 (1): 255-268.

Examples

Run this code

# see subsample function to see how ss is generated
data(ss)
# summarise subsample object
ss.summary = summary(ss)

Run the code above in your browser using DataLab