Learn R Programming

perfectphyloR

The R package perfectphyloR reconstructs perfect phylogenies underlying a sample of DNA sequences, at a focal single-nucleotide variant (SNV). A perfect phylogeny is a rooted binary tree that recursively partitions DNA sequences. Their nested partition structures provide insight into the pattern of ancestry of DNA sequence data. For example, disease sequences may cluster together in a local partition indicating that they arise from a common ancestral haplotypes. Therefore, the availability of an R package that reconstructs perfect phylogenies should be useful to researchers seeking the ancestral structure of their sequence data.

Installation

You can install perfectphyloR from github with:


# install.packages("devtools")
devtools::install_github("cbhagya/perfectphyloR")

Example

To reconstruct a perfect phylogeny, you have to first create an object of class hapMat. createHapMat() allows you to create this new object. To illustrate, we consider a toy example with 4 haplotypes and 4 SNVs.

library(perfectphyloR)
 # Haplotype matrix
 haplo_mat <- matrix(c(1,1,1,0,
                       0,0,0,0,
                       1,1,1,1,
                       1,0,0,0), byrow = TRUE, ncol = 4)
 # SNV names
 SNV_names <- c(paste("SNV", 1:4, sep = ""))
 # Haplotype names
 hap_names <- c("h1", "h2", "h3", "h4")
 # SNV positions in base pairs
 SNV_posns <- c(1000, 2000, 3000, 4000)
 ex_hapMat <- createHapMat(hapmat = haplo_mat,
                           snvNames = SNV_names,
                           hapNames = hap_names ,
                           posns = SNV_posns)
 ex_hapMat
#> $hapmat
#>    SNV1 SNV2 SNV3 SNV4
#> h1    1    1    1    0
#> h2    0    0    0    0
#> h3    1    1    1    1
#> h4    1    0    0    0
#> 
#> $posns
#> [1] 1000 2000 3000 4000
#> 
#> attr(,"class")
#> [1] "hapMat"

Once the hapMat object is created, you can reconstruct the perfect phylogeny partition with the function reconstructPP(). To illustrate, we show how to reconstruct the partition at the second SNV position of ex_hapMat.

# Reconstruct the partition at the SNV position 2.
rdend <- reconstructPP(hapMat = ex_hapMat,
                        focalSNV = 2,
                        minWindow = 1,
                        sep = "-")

You can use plotDend() to plot the dendrogram structure of reconstructed partitions at a focal point. The following example shows how you can plot the reconstructed partition rdend.

plotDend(rdend, direction = "downwards")

Copy Link

Version

Install

install.packages('perfectphyloR')

Monthly Downloads

22

Version

0.2.1

License

GNU General Public License

Maintainer

Charith Karunarathna

Last Published

March 8th, 2021

Functions in perfectphyloR (0.2.1)

createHapMat

Create an object of class hapMat
makePartition

Recursively partition haplotype matrix
makeNewickRec

Build the character string of nodes and haplotypes in Newick format
orderSNVs

Order SNVs
perfectphyloR-package

Reconstruct perfect phylogenies from DNA sequence data
extractHaplos

Extract haplotypes and return it as a list from genotype matrix by vcf file.
findSNVs

Find the window of SNVs at a focal point
dendToNewick

Convert a list data structure to Newick format
makeDend

Recursive partitioning of sequences at a focal point.
getNextFromFocal

Next SNV from the focal point
getnSNVs

Number of SNVs
selectWindow

Select a window of SNVs about the focal SNV.
fourGamete

Four-Gamete Test
reconstructPP

Reconstruct the perfect phylogeny at a given focal SNV
distMatRcpp

This function computes the pair wise distances of tips according to the branching order using Rcpp.
dCor

Compute distance correlation (dCor) coefficient two random vectors(distance matrices).
dCorTest

dCor test for similarity of two matrices
ex_hapMat_data

Example dataset
ex_hapMatSmall_data

Example small dataset
getNextLeftFocal

Expand the neighborhood to the left
getNextRightFocal

Expand the neighborhood to the right
performTest

Perform the user-provided association test
testAssoDist

Test the association between a comparator distance matrix, and the reconstructed dendrograms across a genomic region
testDendAssoRI

Tests Rand Index between a comparator dendrogram and reconstructed dendrograms
reconstructPPregion

Reconstruct perfect phylogeny sequencce across a region
phenoDist

Phenotypic distances
subsetHapMat

Subset hapMat
mantelStat

This function performs Mantel test for correlation between two matrices.
newNode

Create a list of child nodes
tdend

True dendrogram object
splitTips

Separate haplotype names for haplotypes that can not be distingushed in the window around the focal point
orderColsAncestry

noVariation

Check the variation in a SNV
rdistMatrix

Rank-based distances between haplotypes in a given partition
plotDend

Plot reconstructed dendrogram
vcftohapMat

Create a hapMat object from variant call format (vcf) file.
HHGtest

HHG test for association of two distance matrices
MantelTest

Mantel test for association of two distance matirces
checkCompatible

Apply Four-Gamete Test to check the compatibility of a pair of SNVs.
buildDend

Build the tree for the window of SNVs.
RVcoeff

Compute RV coefficient to measure association of two distance matrices
assoTest

Compute the user-provided association statistics
RandIndex

Rand Index
RandIndexTest

Rand Index Test
RVtest

RV test for association of two distance matrices