Learn R Programming

README

R. Mark Sharp, Ph.D. 2025-07-25

nprcgenekeepr

Version 1.0.8 (2025-07-25)

Introduction

The goal of nprcgenekeepr is to implement Genetic Tools for Colony Management. It was initially conceived and developed as a Shiny web application at the Oregon National Primate Research Center (ONPRC) to facilitate some of the analyses they perform regularly. It has been enhanced to have more capability as a Shiny application and to expose the functions so they can be used either interactively or in R scripts.

This work has been supported in part by NIH grants P51 RR13986 to the Southwest National Primate Research Center and P51 OD011092 to the Oregon National Primate Research Center.

At present, the application supports 5 functions:

  1. Quality control of studbooks contained in text files or Excel workbooks and of pedigrees within LabKey Electronic Health Records (EHR)
  2. Creation of pedigrees from a lists of animals using the LabKey EHR integration
  3. Creation and display of an age by sex pyramid plot of the living animals within the designated pedigree
  4. Generation of Genetic Value Analysis Reports
  5. Creation of potential breeding groups with and without proscribed sex ratios and defined maximum kinships.

For more information see:
A Practical Approach for Designing Breeding Groups to Maximize Genetic Diversity in a Large Colony of Captive Rhesus Macaques (Macaca mulatto) Vinson, A ; Raboin, MJ Journal Of The American Association For Laboratory Animal Science, 2015 Nov, Vol.54(6), pp.700-707 [Peer Reviewed Journal]

Installation

You can install the CRAN version of nprcgenekeepr from the R console prompt with:

install.packages("nprcgenekeepr")

You can install the development version of nprcgenekeepr from GitHub from the R console prompt with:

install.packages("devtools")
devtools::install_github(file.path("rmsharp", "nprcgenekeepr"))

All missing dependencies should be automatically installed.

Online Documentation

You can find the complete online documentation at https://rmsharp.github.io/nprcgenekeepr/.

At the top of the page are three menus to the right of the Home icon: Reference, Articles, and Changelog.

The Reference menu at the top of the page brings up the list of documentation for Data objects, Major Features and Functions, Primary interactive functions and All exposed functions.

The Articles menu brings up the list of vignettes, which are, except for Development Plans, tutorials for using the package.

The Changelog brings up a copy of the NEWS file of the package, which records the major changes made for each version.

Running Shiny Application

The toolset available within nprcgenekeepr can be used inside standard R scripts. However, it was originally designed to be used within a Shiny application that can be started with:

library(nprcgenekeepr) # nolint: undesirable_function_linter
runGeneKeepR()

Summary of Major Functions

Quality Control

Studbooks maintained by breeding colonies generally contain information of varying quality. The quality control functions of the toolkit check to ensure all animals listed as parents have their own line entries, all parents have the appropriate sex listed, no animals are listed as both a sire and a dam, duplicate entries are removed, pedigree generation numbers are added, and all dates are valid dates. In addition, exit dates are added if possible and are consistent with other information such as departure dates and death dates. Current ages of animals that are still alive are added if a database connection is provided via a configuration file and the user has read permission on a LabKey server with the demographic data in an EHR (Electronic Health Record) module. See LabKey documentation.

Parents with ages below a user selected threshold are identified. A minimum parent age in years is set by the user and is used to ensure each parent is at least that age on the birth date of an offspring. The minimum parent age defaults to 2 years. This check is not performed for animals with missing birth dates.

Creation of Pedigree From a List of Potential Breeders and LabKey

The user can enter a list of focal animals in a CSV file that will be used to create a pedigree containing all direct relative (ancestors and descendants) via the labkey.selectRows function within the Rlabkey package if a database connection is provided via a configuration file and the user has read permission on a LabKey server with the demographic data in an EHR (Electronic Health Record) module.

Two configuration files are needed to use the database features of nprcgenekeepr with LabKey. The first file is named _netrc on Microsoft Windows operating systems and .netrc otherwise, allows the user to authenticate with LabKey through the LabKey API and is fully described by LabKey documentation

The second file is named _nprcgenekeepr_config on Microsoft Windows operating systems and .nprcgenekeepr_config otherwise and is the nprcgenekeepr configuration file An image of this example configuration file is included as a data object and can be loaded and viewed with the following lines of R code in the R console.

data("exampleNprcgenekeeprConfig")
View(exampleNprcgenekeeprConfig)

Display of an age by sex pyramid plot

Adapted from https://www.thoughtco.com/age-sex-pyramids-and-population-pyramids-1435272 on 20190603. Written by Matt Rosenberg. Updated May 07, 2019.

The most important demographic characteristic of a population is its age-sex structure. Age-sex pyramids (also known as population pyramids) graphically display this information to improve understanding and make comparison easy. The population pyramid sometimes has a distinctive pyramid-like shape when displaying a growing population.

How to Read the Age-Sex Graph

An age-sex pyramid breaks down a population into male and female genders and age ranges. Usually, you’ll find the left side of the pyramid graphing the male population and the right side of the pyramid displaying the female population.

Along the horizontal axis (x-axis) of a population pyramid, the graph displays the population either as a total population of that age or as a percentage of the population at that age. The center of the pyramid starts at zero population and extends out to the left for males and right for females in increasing size, or proportion of the population.

Along the vertical axis (y-axis), age-sex pyramids display two-year age increments, from birth at the bottom to old age at the top.

Genetic Value Analysis Reports

The Genetic Value Analysis is a ranking scheme developed at ONPRC to indicate the relative breeding value of animals in the colony. The scheme uses the mean kinship for each animal to indicate how inter-related it is with the rest of the current breeding colony members. Genome uniqueness is used to provide an indication of whether or not an animal is likely to possess alleles at risk of being lost from the colony. Under the scheme, animals with low mean kinship or high genome uniqueness are ranked more highly.

Breeding Group Formation

One of the goals in breeding group formation is to avoid the potential for mating of closely related animals. Since behavioral concerns and housing constraints will also be taken into account in the group formation process, it is our goal to provide the largest number of animals possible from a list of candidates that can be housed together without risk of consanguineous mating. To that end, this function uses information from the Genetic Value Analysis to search for the largest combinations of animals that can be produced from a list of candidates.

The default options do not consider the sex of individuals when forming the groups, though this has likely been a consideration by the user in selecting the candidate group members. Optionally the user may select to form harem groups, which considers the sex of individuals when forming groups and restricts the number of males to one per group.

For more information see:
A Practical Approach for Designing Breeding Groups to Maximize Genetic Diversity in a Large Colony of Captive Rhesus Macaques (Macaca mulatto) Vinson, A ; Raboin, MJ Journal Of The American Association For Laboratory Animal Science, 2015 Nov, Vol.54(6), pp.700-707 [Peer Reviewed Journal]

Copy Link

Version

Install

install.packages('nprcgenekeepr')

Monthly Downloads

518

Version

1.0.8

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

R. Mark Sharp

Last Published

July 26th, 2025

Functions in nprcgenekeepr (1.0.8)

calcRetention

Calculates Allelic Retention
checkChangedColsLst

checkChangedColsLst examines list for non-empty fields
calculateSexRatio

Calculates the sex ratio (number of non-males / number of males) given animal Ids and their pedigree
checkErrorLst

checkErrorLst examines list for non-empty fields
checkGenotypeFile

Check genotype file
convertFromCenter

Converts the fromCenter information to a standardized code
checkParentAge

Check parent ages to be at least minParentAge
convertSexCodes

Converts sex indicator for an individual to a standardized codes.
createSimKinships

Makes a list object of kinship matrices from simulated pedigrees of possible parents for animals with unknown parents
convertStatusCodes

Converts status indicators to a Standardized code
create_wkbk

Creates an Excel workbook with worksheets.
convertRelationships

Converts pairwise kinship values to a relationship category descriptor.
createPedTree

Create a pedigree tree (PedTree).
findGeneration

Determines the generation number for each id.
createExampleFiles

Creates a folder with CSV files containing example pedigrees and ID lists used to demonstrate the package.
filterThreshold

Filters kinship to remove rows with kinship values less than the specified threshold
findLoops

Find loops in a pedigree tree
getChangedColsTab

getChangedColsTab skeleton of list of errors
chooseAlleles

Combines two vectors of alleles by randomly selecting one allele or the other at each position.
finalRpt

finalRpt is a list object created from the list object rpt prepared by reportGV. It is created inside orderReport. This version is at the state just prior to calling rankSubjects inside orderReport.
getAnimalsWithHighKinship

Forms a list of animal Ids and animals related to them
checkRequiredCols

Examines column names, cols for required column names
countKinshipValues

Counts the number of occurrences of each kinship value seen for a pair of individuals in a series of simulated pedigrees.
countLoops

Count the number of loops in a pedigree tree.
getLkDirectRelatives

Get the direct ancestors of selected animals
calcFE

Calculates founder Equivalents
getDemographics

Get demographic data
getAncestors

Recursively create a character vector of ancestors for an individual ID.
examplePedigree

examplePedigree is a pedigree object created by qcStudbook
exampleNprcgenekeeprConfig

exampleNprcgenekeeprConfig is a loadable version of the example configuration file example_nprcgenekeepr_config
getEmptyErrorLst

Creates a empty errorLst object
getFocalAnimalPed

Get pedigree based on list of focal animals
geneDrop

Gene drop simulation based on the provided pedigree information
getErrorTab

getErrorTab skeleton of list of errors
getPedigree

Get pedigree from file
chooseDate

Choose date based on earlier flag.
getPedDirectRelatives

Get the direct ancestors of selected animals from supplied pedigree.
getPossibleCols

Get possible column names for a studbook.
countFirstOrder

Count first-order relatives.
convertDate

Converts date columns formatted as characters to be of type datetime
correctParentSex

Sets sex for animals listed as either a sire or dam.
fillGroupMembersWithSexRatio

Forms breeding group(s) with an effort to match a specified sex ratio
makeExamplePedigreeFile

Write copy of nprcgenekeepr::examplePedigree into a file
getPotentialParents

Get the lists of portential parents for all individuals born in the colony with one or two unknown parents.
convertAncestry

Converts the ancestry information to a standardized code
getPedMaxAge

Get the maximum age of live animals in the pedigree.
makeCEPH

Make a CEPH-style pedigree for each id
findOffspring

Finds the number of total offspring for each animal in the provided pedigree.
getPotentialSires

Provides list of potential sires
findPedigreeNumber

Determines the generation number for each id.
dataframe2string

dataframe2string converts a data.frame object to a character vector
cumulateSimKinships

Makes a list object containing kinship summary statistics using the list object from createSimKinships.
filterKinMatrix

Filters a kinship matrix to include only the egos listed in 'ids'
getGenotypes

Get genotypes from file
filterPairs

Filters kinship values from a long-format kinship table based on the sexes of the two animals involved.
getGVGenotype

Get Genetic Value Genotype data structure for reportGV function.
getConfigFileName

getConfigFileName returns the configuration file name appropriate for the system.
getCurrentAge

Age in years using the provided birthdate.
getLogo

Get Logo file name
getGVPopulation

Get the population of interest for the Genetic Value analysis.
filterReport

Filters a genetic value report down to only the specified animals
getProbandPedigree

Gets pedigree to ancestors of provided group leaving uninformative ancestors.
kinshipMatricesToKValues

Forms kValue matrix from list of kinship matrices
kinshipMatrixToKValues

Extracts a dataframe with a row for each kinship coeficient in the kinship matrix
get_and_or_list

Returns a one element character string with correct punctuation for a list made up of the elements of the character vector argument.
focalAnimals

focalAnimals is a dataframe with one column (id) containing the of animal Ids from the examplePedigree pedigree.
fixColumnNames

fixColumnNames changes original column names and into standardized names.
makeSimPed

Makes a simulated pedigree using representative sires and dams
getVersion

getVersion Get the version number of nprcgenekeepr
makeRelationClassesTable

Make relation classes table from kin dataframe.
pedFemaleSireMaleDam

pedFemaleSireMaleDam is a dataframe with 8 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with the errors of having a sire labeled as female and a dam labeled as male.
getIncludeColumns

Get the superset of columns that can be in a pedigree file.
getLkDirectAncestors

Get the direct ancestors of selected animals
getIdsWithOneParent

getIdsWithOneParent extracts IDs of animals pedigree without either a sire or a dam
getTokenList

Gets tokens from character vector of lines
kinMatrix2LongForm

Reformats a kinship matrix into a long-format table.
getSiteInfo

Get site information
get_elapsed_time_str

Returns the elapsed time since start_time.
groupAddAssign

Add animals to an existing breeding group or forms groups:
offspringCounts

Finds the total number of offspring for each animal in the pedigree
obfuscatePed

obfuscatePed takes a pedigree object and creates aliases for all IDs and adjusts all date within a specified amount.
getDateErrorsAndConvertDatesInPed

Converts columns of dates in text form to Date object columns
getDatedFilename

Returns a character vector with an file name having the date prepended.
kinship

Generates a kinship matrix.
getOffspring

Get offspring to corresponding animal IDs provided
getParents

Get parents to corresponding animal IDs provided
getPyramidPlot

Creates a pyramid plot of the pedigree provided.
lacy1989Ped

lacy1989Ped small hypothetical pedigree
pedSix

pedSix is a loadable version of a pedigree file fragment used for testing and demonstration
pedWithGenotype

pedWithGenotype is a dataframe produced from qcPed by adding made up genotypes.
pedGood

pedGood is a dataframe with 8 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with no errors.
makeGrpNum

Convenience function to make the initial grpNum list
makeGroupMembers

Convenience function to make the initial groupMembers animal list
getRequiredCols

Get required column names for a studbook.
removeDuplicates

Remove duplicate records from pedigree
hasGenotype

Check for genotype data in dataframe
lacy1989PedAlleles

lacy1989PedAlleles is a dataframe produced by geneDrop on lacy1989Ped with 5000 iterations.
meanKinship

Calculates the mean kinship for each animal in a kinship matrix
hasBothParents

hasBothParents checks to see if both parents are identified.
mapIdsToObfuscated

Map IDs to Obfuscated IDs
qcPed

qcPed is a dataframe with 277 rows and 6 columns
qcBreeders

qcBreeders is a list of 29 baboon IDs that are potential breeders
obfuscateId

obfucateId creates a vector of ID aliases of specified length
obfuscateDate

obfucateDate adds a random number of days bounded by plus and minus max delta
pedOne

pedOne is a loadable version of a pedigree file fragment used for testing and demonstration
pedSameMaleIsSireAndDam

pedSameMaleIsSireAndDam is a dataframe with 8 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with no errors.
nprcgenekeepr-package

Genetic Management Functions
removeEarlyDates

removeEarlyDates removes dates before a specified year
qcPedGvReport

qcPedGvReport is a genetic value report
qcStudbook

Quality Control for the Studbook or pedigree
setExit

Sets the exit date, if there is no exit column in the table
setPopulation

Population designation function
runGeneKeepR

Allows running shiny application with nprcgenekeepr::runGeneKeepR()
saveDataframesAsFiles

Write copy of dataframes to either CSV, TXT, or Excel file.
print.summary.nprcgenekeeprErr

print.summary.nprcgenekeepr print.summary.nprcgenekeeprGV
getPyramidAgeDist

Get the age distribution for the pedigree
pedWithGenotypeReport

pedWithGenotypeReport is a list containing the output of reportGV.
headerDisplayNames

Convert internal column names to display or header names.
nprcgenekeepr

Genetic Tools for Colony Management
removeUnknownAnimals

removeUnknownAnimals Removes unknown animals added to pedigree that serve as placeholders for unknown parents.
reportGV

Generates a genetic value report for a provided pedigree.
withinIntegerRange

Get integer within a range
trimPedigree

Trim pedigree to ancestors of provided group by removing uninformative individuals
is_valid_date_str

Returns TRUE if the string is a valid date.
pedDuplicateIds

pedDuplicateIds is a dataframe with 9 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with a duplicated record.
pedMissingBirth

pedMissingBirth is a dataframe with 8 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with no errors.
pedInvalidDates

pedInvalidDates is a dataframe with 8 rows and 5 columns (ego_id, sire, dam_id, sex, birth_date) representing a full pedigree with values in the birth_date column that are not valid dates.
ped1Alleles

ped1Alleles is a dataframe created by the geneDrop function
removePotentialSires

Removes potential sires from list of Ids
rhesusGenotypes

rhesusGenotypes is a dataframe with two haplotypes per animal
removeUninformativeFounders

Remove uninformative founders.
set_seed

Work around for unit tests using sample() among various versions of R
smallPed

smallPed is a hypothetical pedigree
removeAutoGenIds

Remove automatically generated IDs from pedigree
rankSubjects

Ranks animals based on genetic value.
smallPedTree

smallPedTree is a pedigree tree made from smallPed
rhesusPedigree

rhesusPedigree is a pedigree object
summarizeKinshipValues

Summary statistics for imputed kinship values
summary.nprcgenekeeprErr

summary.nprcgenekeeprErr Summary function for class nprcgenekeeprErr
toCharacter

Force dataframe columns to character
alleleFreq

Calculates the count of each allele in the provided vector.
addSexAndAgeToGroup

Forms a dataframe with Id, Sex, and current Age given a list of Ids and a pedigree
assignAlleles

Assign parent alleles randomly
addUIds

Eliminates partial parentage situations by adding unique placeholder IDs for the unknown parent.
addGenotype

Add genotype data to pedigree file
addBackSecondParents

Add back single parents trimmed pedigree
addParents

Add parents
calcA

Calculates a, the number of an individual's alleles that are rare in each simulation.
addAnimalsWithNoRelative

Adds an NA value for all animals without a relative
addIdRecords

addIdRecords Adds Ego records added having NAs for parent IDs
calcFEFG

Calculates Founder Equivalents and Founder Genome Equivalents
calcFG

Calculates Founder Genome Equivalents
calcAge

Calculate animal ages.
calcGU

Calculates genome uniqueness for each ID that is part of the population.