Learn R Programming

sequoia (version 2.0.7)

CheckGeno: check GenoM

Description

Check that the provided genotype matrix is in the correct format, and check for low call rate samples and SNPs

Usage

CheckGeno(GenoM, quiet = FALSE, Plot = FALSE)

Arguments

GenoM

the genotype matrix

quiet

suppress messages

Plot

display the plots of SnpStats

Value

a list with, if any are found:

ExcludedSNPs

SNPs scored for <10 excluded when running sequoia

ExcludedSnps-mono

monomorphic (fixed) SNPs; automatically excluded when running sequoia. Column numbers are *after* removal of ExcludedSNPs, if any.

ExcludedIndiv

Individuals scored for <5 reliably included during pedigree reconstruction. Individual call rate is calculated after removal of 'Excluded SNPs'

Snps-LowCallRate

SNPs scored for 10 recommended to be filtered out

Indiv-LowCallRate

individuals scored for <50 recommended to be filtered out

Thresholds

Appropriate call rate thresholds for SNPs and individuals depend on the total number of SNPs, distribution of call rates, genotyping errors, and the proportion of candidate parents that are SNPd (sibship clustering is more prone to false positives). Note that filtering first on SNP call rate tends to keep more individuals in.

See Also

SnpStats to calculate SNP call rates; CalcOHLLR to count the number of SNPs scored in both focal individual and parent

Examples

Run this code
# NOT RUN {
GenoM <- SimGeno(Ped_HSg5, nSnp=400, CallRate = runif(400, 0.2, 0.8))
Excl <- CheckGeno(GenoM)
GenoM.orig <- GenoM   # make a 'backup' copy
if ("ExcludedSnps" %in% names(Excl))
  GenoM <- GenoM[, -Excl[["ExcludedSnps"]]]
if ("ExcludedInd" %in% names(Excl))
  GenoM <- GenoM[!rownames(GenoM) %in% Excl[["ExcludedInd"]], ]
if ("ExcludedIndiv" %in% names(Excl))
  GenoM <- GenoM[!rownames(GenoM) %in% Excl[["ExcludedIndiv"]], ]

# warning about  SNPs scored for <50% of individuals ?
SnpCallRate <- apply(GenoM, MARGIN=2,
                     FUN = function(x) sum(x!=-9)) / nrow(GenoM)
hist(SnpCallRate, breaks=50, col="grey")
GenoM <- GenoM[, SnpCallRate > 0.6]

# to be on the safe side, filter out low call rate individuals
IndivCallRate <- apply(GenoM, MARGIN=1,
                       FUN = function(x) sum(x!=-9)) / ncol(GenoM)
hist(IndivCallRate, breaks=50, col="grey")
GoodSamples <- rownames(GenoM)[ IndivCallRate > 0.8]
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab