groupBaseline: Group BASELINe PDFs

Description

groupBaseline convolves groups of BASELINe posterior probability density functions (PDFs) to get combined PDFs for each group.

Usage

groupBaseline(baseline, groupBy, nproc = 1)

Arguments

baseline

Baseline object containing the db and the BASELINe posterior probability density functions (PDF) for each of the sequences, as returned by calcBaseline.

groupBy

The columns in the db slot of the Baseline object by which to group the sequence PDFs.

nproc

number of cores to distribute the operation over. If nproc = 0 then the cluster has already been set and will not be reset.

Value

A Baseline object, containing the modified db and the BASELINe posterior probability density functions (PDF) for each of the groups.

Details

While the selection strengths predicted by BASELINe perform well on average, the estimates for individual sequences can be highly variable, especially when the number of mutations is small.

To overcome this, PDFs from sequences grouped by biological or experimental relevance, are convolved to from a single PDF for the selection strength. For example, sequences from each sample may be combined together, allowing you to compare selection across samples. This is accomplished through a fast numerical convolution technique.

References

Yaari G, et al. Quantifying selection in high-throughput immunoglobulin sequencing data sets. Nucleic Acids Res. 2012 40(17):e134.

Examples

Run this code

# Subset example data
db <- subset(InfluenzaDb, CPRIMER %in% c("IGHA","IGHM") & 
                          BARCODE %in% c("RL016","RL018","RL019","RL021"))

# Calculate BASELINe
# By default, calcBaseline collapses the sequences in the db by the column "CLONE",
# calculates the numbers of observed mutations and expected frequencies of mutations,
# as defined in the IMGT_V_NO_CDR3 and using the HS5FModel targeting model.
# Then, it calculates  the BASELINe posterior probability density functions (PDFs) for
# sequences in the updated db files; using the focused test statistic
db_baseline <- calcBaseline(db, 
                            sequenceColumn="SEQUENCE_IMGT",
                            germlineColumn="GERMLINE_IMGT_D_MASK", 
                            testStatistic="focused",
                            regionDefinition=IMGT_V_NO_CDR3,
                            targetingModel = HS5FModel,
                            nproc = 1)

# Grouping the PDFs by the BARCODE column in the db, corresponding 
# to sample barcodes.
baseline_one <- groupBaseline(db_baseline, groupBy="BARCODE")
 
# Grouping the PDFs by the BARCODE and CPRIMER columns in the db, corresponding 
# respectively to sample barcodes and the constant region isotype primers.
baseline_two <- groupBaseline(db_baseline, groupBy=c("BARCODE", "CPRIMER"))

Run the code above in your browser using DataLab