missing_by_snp: Vizualise missing data per SNP, remove SNPs above a missing data cutoff
Description
This function can be run in two ways: 1) Without 'cutoff' specified. This will vizualise the
amount of missing data in each sample across a variety of potential missing data cutoffs.
Additionally, it will show you dotplots visualizing the number of total SNPs retained across
a variety of filtering cutoffs, and the total proportion of missing data.
Based on these visualizations, you can make an informed decision on what you think might be an
optimal cutoff to minimize the overall missingness of your dataset while still retaining an
appropriate amount of SNPs for the downstream inferences you hope to make 2) with 'cutoff'
specified. This option will show you the dotplots with the cutoff you set,
and then remove SNPs above the missing data cutoff.
Usage
missing_by_snp(vcfR, cutoff = NULL)
Value
if 'cutoff' is not specified, will return a dataframe containing the proportion
missing data and the total SNPs retained across each filtering level. If 'cutoff' is
specified, SNPs falling above the missing data cutoff will be removed,
and the filtered vcfR object will be returned.
Arguments
vcfR
a vcfR object
cutoff
a numeric value between 0-1 specifying the maximum proportion of missing data
allowed in a SNP to be retained for downstream analyses