diemr: Genome polarisation in R
diemr implements the Diagnostic Index Expectation Maximization
(diem) algorithm for genome polarization in R. It estimates
which alleles of single nucleotide variant (SNV) sites belong to either
side of a barrier to gene flow, co-estimates individual assignment, and
infers barrier strength and divergence. These tools are designed for
studies of hybridization, speciation, and population divergence,
and extend the methods described in Baird et al. (2023) Genome
polarisation for detecting barriers to geneflow. Methods in Ecology and
Evolution 14, 512-528 doi:10.1111/2041-210X.14010. For the original
algorithm description and implementations in Python and Mathematica,
see the diem repository at https://github.com/StuartJEBaird/diem.
For a step-by-step explanation of the functions and their outputs, see
the
documentation for
the diemr package.
Installation
To start using diemr, load the package or install it from CRAN if it is not yet available:
if(!require("diemr", character.only = TRUE)){
install.packages("diemr", dependencies = TRUE)
library("diemr", character.only = TRUE)
}
# Loading required package: diemrThe developer version can be installed directly from this repository
using package devtools.
devtools::install_github("https://github.com/nmartinkova/diemr")Set working directory to a location with read and write privileges.
Check data format and polarise genotypes
Next, assemble paths to all files containing the data to be used by diemr. Here, we will use a tiny example dataset for illustration that is included in the package. A good practice is to check that all files contain data in correct format for all individuals and markers.
filepaths <- system.file("extdata", "data7x3.txt",
package = "diemr")
CheckDiemFormat(filepaths, ploidy = list(rep(2, 6)), ChosenInds = 1:6)
# File check passed: TRUE
# Ploidy check passed: TRUEIf the CheckDiemFormat() function fails, work through the error
messages and fix the stored input files accordingly. The algorithm
repeatedly accesses data from the harddisk, so seeing the passed file
check prior to analysis is critical.
diem.res <- diem(files = filepaths,
ploidy = list(rep(2, 6)),
ChosenInds = 1:6,
nCores = 1)The results including marker polarisation, marker diagnostic index and
its support will be included in the list element diem.res$DI.
Additional elements in the results list contain basic tracking
information about the expectation maximisation iterations. The key
results are saved in a file MarkerDiagnosticsWithOptimalPolarities.txt
in the working directory. Check the the diemr
documentation for
further information.