Learn R Programming

karyotapR

karyotapR enables analysis of DNA copy number (aneuploidy) using custom genome-wide targeted DNA sequencing panels for the Mission Bio Tapestri system.

Installation

You can install the current stable version of karyotapR from CRAN with:

install.packages('karyotapR')

You can install the development version of karyotapR from GitHub with:

# install.packages("devtools")
devtools::install_github("joeymays/karyotapR")

Basic Usage

Data Import

In practice, the dataset is imported from the .h5 file that is generated by the Tapestri Pipeline.

library(karyotapR)
library(forcats)
set.seed(2023) #seed set to ensure example is reproducible 
## NOT RUN
example.exp <- createTapestriExperiment(h5.filename)

We’ll use a toy dataset for this example.

example.exp <- newTapestriExperimentExample()
#> ℹ Moving gRNA probe  to `altExp` slot "grnaCounts".
#> ℹ Moving barcode probe  to `altExp` slot "barcodeCounts".
#> ℹ Moving chrY probe(s) probe_231, probe_232, probe_233, probe_234, probe_235, probe_236, probe_237, probe_238, probe_239, and probe_240 to `altExp` slot "chrYCounts".

Calling the TapestriExperiment will print a summary of the contained data.

example.exp
#> class: TapestriExperiment 
#> dim: 230 300 
#> metadata(7): sample.name pipeline.panel.name ... date.h5.created
#>   mean.reads.per.cell.per.probe
#> assays(1): counts
#> rownames(230): probe_1 probe_2 ... probe_229 probe_230
#> rowData names(7): probe.id chr ... total.reads median.reads
#> colnames(300): cell_1 cell_2 ... cell_299 cell_300
#> colData names(3): cell.barcode test.cluster total.reads
#> reducedDimNames(0):
#> mainExpName: CNV
#> altExpNames(2): chrYCounts alleleFrequency
#> barcodeProbe: dummyBCprobe
#> grnaProbe: dummyGRNAprobe
#> gmmParams(0):

Clustering

We cluster on allele frequency to partition different cell lines represented in the experiment. First, run PCA and use the knee plot to identify the PCs accounting for the most variation in the dataset.

example.exp <- runPCA(example.exp)
PCAKneePlot(example.exp)

Run UMAP with the top PCs.

example.exp <- runUMAP(example.exp, pca.dims = 1:2)
#> ℹ Running UMAP on: alleleFrequency.
reducedDimPlot(example.exp, dim.reduction = "umap")

Run runClustering() to cluster with dbscan.

example.exp <- runClustering(example.exp, eps = 0.9)
#> ℹ Finding clusters in: alleleFrequency UMAP

Visualize UMAP, using “cluster” label to color.

reducedDimPlot(example.exp, dim.reduction = "umap", group.label = "cluster")

Rename cluster labels by renaming the factor levels of “cluster”. The forcats package makes this easy.

colData(example.exp)$cluster <- forcats::fct_recode(colData(example.exp)$cluster, cellline1 = "1", cellline2 = "2", cellline3 = "3")

Copy Number Calculation

Normalize counts and calculate copy number relative to cellline 3, which is diploid. control.copy.number gives the cluster label and copy number value to normalize each chromosome arm to.

example.exp <- calcNormCounts(example.exp)
control.copy.number <- generateControlCopyNumberTemplate(example.exp, sample.feature.label = "cellline3", copy.number = 2)
example.exp <- calcCopyNumber(example.exp, control.copy.number = control.copy.number, sample.feature = "cluster")
example.exp <- calcSmoothCopyNumber(example.exp)
#> ℹ Smoothing copy number by median...
#> ✔ Smoothing copy number by median... [1.3s]
#> 

Visualize copy number. Visualization reveals that cell line 1 has 1 copy of chromosome arm 1p, and cell line 2 has three copies of chromosome 7.

assayHeatmap(example.exp, assay = "copyNumber", split.col.by = "arm", split.row.by = "cluster", annotate.row.by = "cluster", color.preset = "copy.number")
assayHeatmap(example.exp, alt.exp = "smoothedCopyNumberByArm", assay = "discreteCopyNumber", split.row.by = "cluster", annotate.row.by = "cluster", color.preset = "copy.number")

Visualize chrY counts. Visualization reveals that cellline 2 has chr Y and is therefore male, while the other two cell lines are female.

assayBoxPlot(example.exp, alt.exp = "chrYCounts", split.features = TRUE, split.x.by = "cluster")

Copy Link

Version

Install

install.packages('karyotapR')

Monthly Downloads

196

Version

1.0.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Joseph Mays

Last Published

September 7th, 2023

Functions in karyotapR (1.0.1)

countBarcodedReads

Get read counts from barcoded reads
corner

Print the top-left corner of a matrix
callSampleLables

Call sample labels based on feature counts
getGMMBoundaries

Calculate decision boundaries between components of copy number GMMs
getCytobands

Add chromosome cytobands and chromosome arms to TapestriExperiment
karyotapR-package

karyotapR: DNA Copy Number Analysis for Genome-Wide Tapestri Panels
getChrOrder

Get chromosome order from a string of chromosome/contig names
createTapestriExperiment

Create TapestriExperiment object from Tapestri Pipeline output
runUMAP

Cluster matrix data by UMAP
runPCA

Cluster assay data by Principal Components Analysis
getTidyData

Get tidy-style data from TapestriExperiment objects
moveNonGenomeProbes

Move non-genome probes counts and metadata to altExp slots
newTapestriExperimentExample

Create Example TapestriExperiment
reducedDimPlot

Scatter plot for dimensional reduction results
runClustering

Cluster 2D data
Custom Slot Getters and Setters

Getter and Setter functions for TapestriExperiment slots
plotCopyNumberGMM

Plot copy number GMM components
calcNormCounts

Normalize raw counts
PCAKneePlot

Plot of PCA proportion of variance explained
calcCopyNumber

Calculate relative copy number value for each cell-probe unit using reference sample
calcSmoothCopyNumber

Smooth copy number values across chromosomes and chromosome arms
calcGMMCopyNumber

Call copy number for each cell-chromosome using Gaussian mixture models
assayBoxPlot

Generate a box plot from assay data
TapestriExperiment-class

TapestriExperiment Class Definition
assayHeatmap

Generate heatmap of assay data