Learn R Programming

poppr (version 2.1.1)

mlg.filter: Statistics on Clonal Filtering of Genotype Data

Description

Create a vector of multilocus genotype indices filtered by minimum distance.

Usage

mlg.filter(pop, threshold = 0, missing = "asis", memory = FALSE,
  algorithm = "farthest_neighbor", distance = "diss.dist", threads = 0,
  stats = "MLGs", ...)

mlg.filter(pop, missing = "asis", memory = FALSE, algorithm = "farthest_neighbor", distance = "diss.dist", threads = 0, ...) <- value

Arguments

pop
a genclone, snpclone, or genind object.
threshold
the desired minimum distance between distinct genotypes. Defaults to 0, which will only merge identical genotypes
missing
any method to be used by missingno: "mean", "zero", "loci", "genotype", or "asis" (default).
memory
whether this function should remember the last distance matrix it generated. TRUE will attempt to reuse the last distance matrix if the other parameters are the same. (default) FALSE will ignore any stored matrices and not store any it generates.
algorithm
determines the type of clustering to be done. (default) "farthest_neighbor" merges clusters based on the maximum distance between points in either cluster. This is the strictest of the three. "nearest_neighbor" merges clusters based on the minimum dist
distance
a character or function defining the distance to be applied to pop. Defaults to diss.dist for genclone objects and bitwise.dist for snpclone objects.
threads
The maximum number of parallel threads to be used within this function. A value of 0 (default) will attempt to use as many threads as there are available cores/CPUs. In most cases this is ideal. A value of 1 will force the function to run serially, whi
stats
determines which statistics this function should return on cluster mergers. If (default) "MLGs", this function will return a vector of cluster assignments, similar to that of mlg.vector. If "thresholds
...
any parameters to be passed off to the distance method.
value
the threshold at which genotypes should be collapsed.

Value

  • Default, the collapsed multilocus genotypes. Otherwise, any combination of the following: MLGs{ a numeric vector naming the multilocus genotype of each individual in the dataset. Each genotype is at least the specified distance apart, as calculated by the selected algorithm. If stats is set to TRUE, this function will return the thresholds had which each cluster merger occurred instead of the new cluster assignments. } THRESHOLDS{ A numeric vector representing the thresholds beyond which clusters of multilocus genotypes were collapsed. } DISTANCES{ A square matrix representing the distances between each cluster. } SIZES{ The sizes of the multilocus genotype clusters in order. }

Details

This function will take in any distance matrix or function and collapse multilocus genotypes below a given threshold. If you use this function as the assignment method (mlg.filter(myData, distance = myDist) <- 0.5), the distance function or matrix will be remembered by the object. This means that if you define your own distance matrix or function, you must keep it in memory to further utilize mlg.filter.

See Also

filter_stats cutoff_predictor mll genclone snpclone diss.dist bruvo.dist

Examples

Run this code
data(partial_clone)
pc <- as.genclone(partial_clone) # convert to genclone object

# Get MLGs at threshold 0.05
mlg.filter(pc, threshold = 0.05, distance = "nei.dist")
pc # 26 mlgs

# Set MLGs at threshold 0.05
mlg.filter(pc, distance = "nei.dist") <- 0.05
pc # 25 mlgs

# The distance definition is persistant
mlg.filter(pc) <- 0.1
pc # 24 mlgs

# You can still change the definition
mlg.filter(pc, distance = diss.dist, percent = TRUE) <- 0.1
pc

# Even with custom definitions
data(Pinf)
Pinf
mlg.filter(Pinf, distance = function(x) dist(tab(x))) <- 3
Pinf
mlg.filter(Pinf) <- 4
Pinf

# on genlight/snpclone objects
set.seed(999)
gc <- as.snpclone(glSim(100, 0, n.snp.struc = 1e3, ploidy = 2))
gc # 100 mlgs
mlg.filter(gc) <- 0.25
gc # 82 mlgs

Run the code above in your browser using DataLab