Learn R Programming

MBASED (version 1.6.0)

runMBASED: Main function that implements MBASED.

Description

Main function that implements MBASED.

Usage

runMBASED(ASESummarizedExperiment, isPhased = FALSE, numSim = 0, BPPARAM = SerialParam())

Arguments

ASESummarizedExperiment
RangedSummarizedExperiment object containing information on read counts to be used for ASE detection. Rows represent individual heterozygous loci (SNVs), while columns represent individual samples. There should be either one or two columns, depending on whether one- or two-sample analysis is to be performed. Joint analysis of multiple samples or replicates is currently not supported, and one-sample analysis of multiple samples must be done through independent series of calls to runMBASED(). Note that for two-sample analysis, only loci which are heterozygous in both samples must be supplied (this excludes, e.g., tumor-specific mutations in cases of tumor/normal comparisons). For two-sample analysis, it is assumed that the first column corresponds to 'sample1' and the second column to 'sample2' in the sample1-vs-sample2 comparison. This is important, since differential ASE assessment is not symmetric and sample1-vs-sample2 comparison may yield different results from sample2-vs-sample1 comparison (the relationship is set up by assuming that only instances of ASE greater in sample1 than in sample2 are of interest). assays(ASESummarizedExperiment) must contain matrices lociAllele1Counts and lociAllele2Counts of non-negative integers, containing counts of allele1 (e.g. reference) and allele2 (e.g. alternative) at individual loci. All supplied loci must have total read count (across both alleles) greater than 0 (in each of the two samples, in the case of two-sample analysis). Allele counts are not necessarily phased (see 'isPhased' argument below), so allele1 counts may not represent the same haplotype. assays(ASESummarizedExperiment) may also contain matrix lociAllele1CountsNoASEProbs with entries >0 and <1, containing="" probabilities="" of="" observing="" allele1-supporting="" reads="" at="" individual="" loci="" under="" conditions="" no="" ase="" (which="" may="" differ="" for="" samples="" in="" the="" two-sample="" analysis).="" if="" this="" matrix="" is="" not="" provided,="" it="" constructed="" such="" that="" every="" entry="" set="" to="" 0.5="" (no="" pre-existing="" allelic="" bias="" any="" locus="" sample).="" assays(asesummarizedexperiment)="" also="" contain="" locicountsdispersions="" with="" entries="">=0 and
isPhased
specifies whether the true haplotypes are known, in which case the lociAllele1Counts are assumed to represent allelic counts along the same haplotype (and the same is true of lociAllele2Counts). Must be either TRUE or FALSE (DEFAULT).
numSim
number of simulations to perform to estimate statistical signficance of observed ASE. Must be a non-negative integer. If set to 0 (DEFAULT), no simulations are performed and nominal p-values are reported.
BPPARAM
argument to be passed to function bplapply(), when parallel achitecture is used to speed up simulations (parallelization is done over aseIDs). DEFAULT: SerialParam() (no parallelization).

Value

RangedSummarizedExperiment object with rows representing individual aseIDs (genes) and a single column. assays(returnObject) includes single-column matrices 'majorAlleleFrequency' (1-sample analysis only), 'majorAlleleFrequencyDifference' (2-sample analysis only), 'pValueASE' (unadjusted ASE p-value), 'pValueHeterogeneity' (unadjusted inter-loci variability p-value, set to NA for aseIDs with only 1 locus). Note that p-values are not adjusted for multiple hypothesis testing, and the users should carry out such an adjustment themselves, e.g. by employing the utilities in the multtest package. In addition, metadata(returnObject) is a list containing a RangedSummarizedExperiment object names 'locusSpecificResults', with rows corresponding to individual loci (SNVs) and a single column, that provides information on locus-level MBASED analysis results. assays(metadata(returnObject)$locusSpecificResults) contains single-column matrices 'MAF' (estimate of allele frequency for gene-wide major allele at the locus, 1-sample analysis only), 'MAFDifference' (estimate of allele frequency difference for gene-wide major allele at the locus, 2-sample analysis only), and 'allele1IsMajor' (whether allele1 is assigned to major haplotype by MBASED).

Examples

Run this code

mySNVs <- GRanges(
    seqnames=c('chr1', 'chr2', 'chr2', 'chr2'),
     ranges=IRanges(start=c(1000, 20020, 20285, 21114), width=1),
     aseID=c('gene1', rep('gene2', 3)),
    allele1=c('G', 'A', 'C', 'A'),
    allele2=c('T', 'C', 'T', 'G')
)
names(mySNVs) <- paste0('SNV', 1:4)
## RangedSummarizedExperiment object with data to run tumor vs. normal comparison
mySE_TumorVsNormal <- SummarizedExperiment(
     assays=list(
        lociAllele1Counts=matrix(
             c(
                c(25,10,22,14),
                c(18,17,14,28)
            ),
            ncol=2,
            dimnames=list(
                names(mySNVs),
                c('tumor', 'normal')
            )
         ),
        lociAllele2Counts=matrix(
             c(
                c(20,16,15,16),
                 c(23,9,24,17)
            ),
            ncol=2,
            dimnames=list(
                names(mySNVs),
                 c('tumor', 'normal')
            )
         ),
         lociAllele1CountsNoASEProbs=matrix(
             c(
                 c(0.48, 0.51, 0.55, 0.45),
                 c(0.52, 0.43, 0.52, 0.43)
             ),
             ncol=2,
             dimnames=list(
                 names(mySNVs),
                 c('tumor', 'normal')
             )
         ),
        lociCountsDispersions=matrix(
            c(
                 c(0.005, 0.007, 0.003, 0.01),
                 c(0.001, 0.004, 0.02, 0.006)
             ),
            ncol=2,
            dimnames=list(
                 names(mySNVs),
                 c('tumor', 'normal')
            )
         )
    ),
     rowRanges=mySNVs
)
twoSampleAnalysisTumorVsNormal <- runMBASED(
    ASESummarizedExperiment=mySE_TumorVsNormal,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(twoSampleAnalysisTumorVsNormal)
assays(twoSampleAnalysisTumorVsNormal)$majorAlleleFrequencyDifference
assays(twoSampleAnalysisTumorVsNormal)$pValueASE
assays(twoSampleAnalysisTumorVsNormal)$pValueHeterogeneity
assays(metadata(twoSampleAnalysisTumorVsNormal)$locusSpecificResults)$MAFDifference
assays(metadata(twoSampleAnalysisTumorVsNormal)$locusSpecificResults)$allele1IsMajor

## exchanging the order of the columns will allow us to run normal vs. tumor comparison
## Note that while results are the same for single-locus gene1, they differ for multi-locus gene2
mySE_NormalVsTumor <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,c('normal','tumor')]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[2:1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_NormalVsTumor )) <- names(assays(mySE_TumorVsNormal))
twoSampleAnalysisNormalVsTumor <- runMBASED(
    ASESummarizedExperiment=mySE_NormalVsTumor,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(twoSampleAnalysisNormalVsTumor)
assays(twoSampleAnalysisNormalVsTumor)$majorAlleleFrequencyDifference
assays(twoSampleAnalysisNormalVsTumor)$pValueASE
assays(twoSampleAnalysisNormalVsTumor)$pValueHeterogeneity
assays(metadata(twoSampleAnalysisNormalVsTumor)$locusSpecificResults)$MAFDifference
assays(metadata(twoSampleAnalysisNormalVsTumor)$locusSpecificResults)$allele1IsMajor

## we can also do separate one-sample analysis on tumor and normal samples
mySE_Tumor <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,'tumor',drop=FALSE]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_Tumor)) <- names(assays(mySE_TumorVsNormal))
oneSampleAnalysisTumor <- runMBASED(
    ASESummarizedExperiment=mySE_Tumor,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(oneSampleAnalysisTumor)
assays(oneSampleAnalysisTumor)$majorAlleleFrequency
assays(oneSampleAnalysisTumor)$pValueASE
assays(oneSampleAnalysisTumor)$pValueHeterogeneity
assays(metadata(oneSampleAnalysisTumor)$locusSpecificResults)$MAF
assays(metadata(oneSampleAnalysisTumor)$locusSpecificResults)$allele1IsMajor

mySE_Normal <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,'normal',drop=FALSE]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_Normal)) <- names(assays(mySE_TumorVsNormal))
oneSampleAnalysisNormal <- runMBASED(
    ASESummarizedExperiment=mySE_Normal,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(oneSampleAnalysisNormal)
assays(oneSampleAnalysisNormal)$majorAlleleFrequency
assays(oneSampleAnalysisNormal)$pValueASE
assays(oneSampleAnalysisNormal)$pValueHeterogeneity
assays(metadata(oneSampleAnalysisNormal)$locusSpecificResults)$MAF
assays(metadata(oneSampleAnalysisNormal)$locusSpecificResults)$allele1IsMajor

Run the code above in your browser using DataLab