Learn R Programming

BIGr (version 0.6.2)

check_ped: Evaluate Pedigree File for Accuracy

Description

Check a pedigree file for accuracy and output suspected errors

Usage

check_ped(ped.file, seed = NULL, verbose = TRUE)

Value

A list of data.frames of error types, and the output printed to the console

Arguments

ped.file

path to pedigree text file. The pedigree file is a 3-column pedigree tab separated file with columns labeled as id sire dam in any order

seed

Optional seed for reproducibility

verbose

Logical. If TRUE, print the errors to the console.

Details

check_ped takes a 3-column pedigree tab separated file with columns labeled as id sire dam in any order and checks for:

  • Ids that appear more than once in the id column

  • Ids that appear in both sire and dam columns

  • Direct (e.g. parent is a offspring of his own daughter) and indirect (e.g. a great grandparent is son of its grandchild) dependencies within the pedigree.

  • Individuals included in the pedigree as sire or dam but not on the id column and reports them back with unknown parents (0).

When using check_ped, do a first run to check for repeated ids and parents that appear as sire and dam. Once these errors are cleaned run the function again to check for dependencies as this will provide the most accurate report.

Note: This function does not change the input file but prints any errors found in the console.

Examples

Run this code
##Get list with a dataframe for each error type
ped_file <- system.file("check_ped_test.txt", package="BIGr")
ped_errors <- check_ped(ped.file = ped_file,
                        seed = 101919)

##Access the "messy parents" dataframe result
ped_errors$messy_parents

##Get list of sample IDs with messy parents error
messy_parent_ids <- ped_errors$messy_parents$id
print(messy_parent_ids)

Run the code above in your browser using DataLab