Learn R Programming

MitoHEAR

MitoHEAR (Mitochondrial HEteroplasmy AnalyzeR) is an R package that allows the estimation as well as downstream statistical analysis of the mtDNA heteroplasmy calculated from single-cell datasets.

Installation

You can install the released version of MitoHEAR from CRAN with:

install.packages("MitoHEAR")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("https://github.com/ScialdoneLab/MitoHEAR/tree/master")
library(MitoHEAR)

Getting started

The package has two main functions: get_raw_counts_allele and get_heteroplasmy.

library(MitoHEAR)

load(system.file("extdata", "meta_data_antonio_final.Rda", package = "MitoHEAR"))
#cell_names <- meta_data_antonio_final$antonio_array.Comment.ENA_RUN.
#path_to_bam <- "full_path_to_bam_files"
#bam_input <- paste(path_to_bam,cell_names, ".unique.bam", sep = "")
#path_fasta <- "full_path_to_fasta_file"
#output_SNP_antonio_mt <- get_raw_counts_allele(bam_input, path_fasta, cell_names, cores_number = 1 )
load(system.file("extdata", "output_SNP_antonio_mt.Rda", package = "MitoHEAR"))
matrix_allele_counts <- output_SNP_antonio_mt[[1]]
# In this example we have 124 cells and 65196 columns (4 possible alleles for the 16299 bases in the mouse MT genome)
name_position_allele <- output_SNP_antonio_mt[[2]]
name_position <- output_SNP_antonio_mt[[3]]
row.names(meta_data_antonio_final) <- meta_data_antonio_final$antonio_array.Comment.ENA_RUN.
meta_data_antonio_final <- meta_data_antonio_final[row.names(matrix_allele_counts), ]
row.names(matrix_allele_counts) <- meta_data_antonio_final$antonio_array.Source.Name
row.names(meta_data_antonio_final) <- meta_data_antonio_final$antonio_array.Source.Name

We select only the cells for the 2-cells stage for down-stream analysis.

stage_2_cells <- row.names(matrix_allele_counts)[grep("2cell_", row.names(matrix_allele_counts))]
stage_2_cells <- stage_2_cells[!grepl("32cell_", stage_2_cells)]

The next step is to obtain a matrix with allele frequencies and a matrix with heteroplasmy values for each pair of cell-base. This is obtained with the function get_heteroplasmy. This function performs a two step filtering procedure, the first on the cells and the second on the bases. The aim is to keep only the cells that have more than number_reads counts in more than number_positions bases and to keep only the bases that are covered by more than number_reads counts in all the remaining cells (filtering=1) or in at least 50% of cells in each cluster (filtering=2).


sc_data <- get_heteroplasmy(matrix_allele_counts[stage_2_cells, ], name_position_allele, name_position, 50, 2000, filtering = 1)

Among the output of get_heteroplasmy there are the matrix with heteroplasmy values and the matrix with allele frequencies, for all the cells and bases that pass the two steps filtering procedure. The heteroplasmy is computed as 1-max(f), where f are the frequencies of the four alleles for every cell-base pair. For more info about the output see ?get_heteroplasmy.

sum_matrix <- sc_data[[1]]
sum_matrix_qc <- sc_data[[2]]
heteroplasmy_matrix_sc <- sc_data[[3]]
allele_matrix_sc <- sc_data[[4]]
cluster_sc <- as.character(meta_data_antonio_final[row.names(heteroplasmy_matrix_sc), ]$antonio_array.Characteristics.developmental.stage.)
index_sc <- sc_data[[5]]

Down-stream analysis

MitoHEAR offers several ways to extrapolate relevant information from heteroplasmy measurement. For the identification of most different bases according to heteroplasmy between two group of cells (i.e. two clusters), an unpaired two-samples Wilcoxon test is performed with the function get_wilcox_test. The heteroplasmy and the corresponding allele frequencies for a specific base can be plotted with plot_heteroplasmy and plot_allele_frequency. If for each sample a diffusion pseudo time information is available, then it is possible to detect the bases whose heteroplasmy changes in a significant way along pseudo-time with dpt_test and to plot the trend with plot_dpt. It is also possible to perform a cluster analysis on the samples based on distance matrix obtained from allele frequencies with clustering_angular_distance and to visualize an heatmap of the distance matrix with samples sorted according to the cluster result with plot_heatmap. This approach could be useful for lineage tracing analysis. For more exhaustive information about the functions offered by MitoHEAR see Vignettes section below and the help page of the single functions. (?function_name).

Vignettes

The following vignette is provided within the package MitoHEAR and is accessible within R:

#utils::vignette("MitoHEAR")

Copy Link

Version

Install

install.packages('MitoHEAR')

Monthly Downloads

145

Version

0.1.0

License

Artistic-2.0

Maintainer

Gabriele Lubatti

Last Published

March 1st, 2022

Functions in MitoHEAR (0.1.0)

dpt_test

dpt_test
get_heteroplasmy

get_heteroplasmy
plot_allele_frequency

plot_allele_frequency
get_distribution

get_distribution
filter_bases

filter_bases
detect_insertion

detect_insertion
choose_features_clustering

choose_features_clustering
get_raw_counts_allele

get_raw_counts_allele
get_wilcox_test

get_wilcox_test
clustering_angular_distance

clustering_angular_distance
plot_base_coverage

plot_base_coverage
plot_spider_chart

plot_spider_chart
vi_comparison

vi_comparison We compute the variation of information (VI) between the partition provided by new_classification and old_classification. The VI between a random partitions (obtained with re-shuffle from original labels in old_classification) and old_classification is also computed. A distribution of VI values from random partitions is built. Finally, from the comparison with this distribution, an empirical p value is given to the VI of the unsupervised cluster analysis.
plot_batch

plot_batch
plot_coordinate_cluster

plot_coordinate_cluster
plot_correlation_bases

plot_correlation_bases
plot_condition

plot_condition
plot_cells_coverage

plot_cells_coverage
plot_coordinate_heteroplasmy

plot_coordinate_heteroplasmy
plot_distance_matrix

plot_distance_matrix
plot_distribution

plot_distribution
plot_heteroplasmy_variability

plot_heteroplasmy_variability
plot_heteroplasmy

plot_heteroplasmy
plot_dpt

plot_dpt
plot_heatmap

plot_heatmap
plot_genome_coverage

plot_genome_coverage