Runs and evaluates results from plink --freq. It calculates the minor allele
frequencies for all variants in the individuals that passed the
perIndividualQC
. The minor allele frequency distributions is
plotted as a histogram.
check_maf(indir, name, qcdir = indir, macTh = 20, mafTh = NULL,
verbose = FALSE, interactive = FALSE, path2plink = NULL,
showPlinkOutput = TRUE)
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam.
[character] /path/to/directory where results will be written to.
If perIndividualQC
was conducted, this directory should be the
same as qcdir specified in perIndividualQC
, i.e. it contains
name.fail.IDs with IIDs of individuals that failed QC. User needs writing
permission to qcdir. Per default, qcdir=indir.
[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples).
[double] Threshold for minor allele frequency cut-off.
[logical] If TRUE, progress info is printed to standard out and specifically, if TRUE, plink log will be displayed.
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_maf, other_arguments) or pdf(outfile) print(p_maf) dev.off().
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by exec_wait
('plink').
[logical] If TRUE, plink log and error messages are printed to standard out.
Named list with i) fail_maf containing a [data.frame] with CHR (Chromosome code), SNP (Variant identifier), A1 (Allele 1; usually minor), A2 (Allele 2; usually major), MAF (Allele 1 frequency), NCHROBS (Number of allele observations) for all SNPs that failed the mafTh/macTh and ii) p_maf, a ggplot2-object 'containing' the MAF distribution histogram which can be shown by (print(p_maf)).
check_maf
uses plink --remove name.fail.IDs --freq to calculate the
minor allele frequencies for all variants in the individuals that passed the
perIndividualQC
. It does so without generating a new dataset
but simply removes the IDs when calculating the statistics.
For details on the output data.frame fail_maf, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#frq.
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# }
# NOT RUN {
fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, macTh=15,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# }
Run the code above in your browser using DataLab