Learn R Programming

sequoia (version 1.3.1)

SnpStats: SNP summary statistics

Description

Estimate allele frequency (AF), missingness and Mendelian errors per SNP.

Usage

SnpStats(GenoM, Ped = NULL, Plot = TRUE)

Arguments

GenoM

Genotype matrix, in sequoia's format: 1 column per SNP, 1 row per individual, genotypes coded as 0/1/2/-9, and rownames giving individual IDs.

Ped

a dataframe with 3 columns: ID - parent1 - parent2. Additional columns and non-genotyped individuals are ignored. Only used to estimate the error rate.

Plot

show histograms of the results?

Value

a matrix with a number of rows equal to the number of SNPs (=number of columns of GenoM) and 2 or 3 columns:

AF

Allele frequency of the 'second allele' (the one for which the homozygote is coded 2)

Mis

Proportion of missing calls

ER

(only when Ped provided) number of Mendelian errors in parent- offspring pairs (i.e. the number of opposing homozygotes, 'OHdam' & 'OHsire' in pedigree) and parent-parent-offspring trios ('MEpairs' in pedigree).

Details

Calculation of these summary statistics can be done in PLINK, and SNPs with low minor allele freuqency or high missigness should be filtered out using PLINK prior to pedigree reconstruction. This function is merely provided as an aid to inspect the relationship between AF, missingness and error to find a suitable combination of thresholds to use.

The error count includes both the number of parent-offspring pairs that are opposing homozygotes (parent is AA and offspring is aa), as Mendelian errors in parent-parent-offspring trios (e.g. parents AA and aa, but offspring not Aa).

The underlying genotyping error can not be easily estimated from the number of Mendelian errors, as many errors may go undetected and a single error in a prolific individual can result in a high number of Mendelian errors. Moreover, a high error rate may interfere with pedigree reconstruction, and succesful assignment will be biased towards parents with lower error count.

See Also

GenoConvert