SNP datasets generated by DArT have missing values primarily arising from failure to call a SNP because of a mutation at one or both of the the restriction enzyme recognition sites. This script filters out loci (or specimens) for which the call rate is lower than a specified value. The script will also filter out loci (or specimens) in SilicoDArT (presence/absence) datasets where the call rate is lower than the specified value. In this case, the data are missing owing to low coverage.
gl.filter.callrate(x, method = "loc", threshold = 0.95,
mono.rm = TRUE, recalc = FALSE, plot = FALSE, v = 2)
name of the genlight object containing the SNP data, or the genind object containing the SilocoDArT data [required]
-- "loc" to specify that loci are to be filtered, "ind" to specify that specimens are to be filtered [default "loc"]
-- threshold value below which loci will be removed [default 0.95]
-- Remove monomorphic loci [default TRUE]
-- Recalculate the locus metadata statistics if any individuals are deleted in the filtering [default FALSE]
specify if a histogram of call rate is to be produced [default FALSE]
-- verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2]
The reduced genlight or genind object, plus a summary
Because this filter operates on call rate, and previously applied functions may not have recalculated locus metrics, this function recalculates Call Rate before filtering. Recalculaton after filtering remains optional, with no recalculation as the default.
Note that when filtering individuals on call rate, the initial call rate is calculated and compared against the threshold. After filtering, if mono.rm=TRUE, the removal of monomorphic loci will alter the call rates. Some individuals with a call rate initially greater than the nominated threshold, and so retained, may come to have a call rate lower than the threshold. If this is a problem, repeated iterations of this function will resolve the issue.
# NOT RUN {
result <- gl.filter.callrate(testset.gl, method="ind", threshold=0.8)
# }
Run the code above in your browser using DataLab