Learn R Programming

growthcleanr

R package for cleaning data from Electronic Health Record systems, focused on cleaning height and weight measurements.

This package implements the Daymont et al. algorithm, as specified in Supplemental File 3 within the Supplementary Material published with that paper.

Carrie Daymont, Michelle E Ross, A Russell Localio, Alexander G Fiks, Richard C Wasserman, Robert W Grundmeier, Automated identification of implausible values in growth data from pediatric electronic health records, Journal of the American Medical Informatics Association, Volume 24, Issue 6, November 2017, Pages 1080–1087, https://doi.org/10.1093/jamia/ocx037

This package also includes an R version of the SAS macro published by the CDC for calculating percentiles and Z-scores of pediatric growth observations and utilities for working with both functions. As of summer 2021, it also supports cleaning anthropometric measurements for adults up to age 65. The adult algorithm has not yet been published in a peer-reviewed publication, but is described in detail at Adult algorithm.

Bug and current fix

There is an error in the recentering tables in the current CRAN version that will causes the algorithm to inappropriately exclude birth observations in datasets with <5K observations. Until we are able to make changes on CRAN, we recommend using devtools to install growthcleanr from GitHub. We are working on updates as soon as possible.

Installation

To install the stable version from CRAN:

install.packages("growthcleanr")

Or from Github directly:

library(devtools)
install_github("carriedaymont/growthcleanr")

Summary

The growthcleanr package processes data prepared in a specific format to identify biologically implausible height and weight measurements. It bases these evaluations on techniques which use patient-specific longitudinal analysis and variations from published growth trajectory charts for pediatric subjects. These techniques are performed in a specific order which refines and improves results throughout the process.

Results from growthcleanr include a flag for each measurement indicating whether it is to be included or excluded based on plausibility, with a variety of specific types of exclusions identified distinctly. These flags can be analyzed further by researchers studying anthropometric EHR data to determine which measurements to include or exclude in their own studies. No values are deleted or otherwise removed; each is only flagged in a new column.

To start running growthcleanr, an R installation with a variety of additional packages is required, as is a growth measurement dataset prepared for use in growthcleanr.

The rest of this documentation includes:

Getting started:

  • Quickstart, a brief tour of using growthcleanr, including data preparation
  • Installation, options for installing growthcleanr, with notes on specific platforms and source-level installation for developers
  • Usage, examples of cleaning data, multiple options, example data

Advanced topics:

Changes

For a detailed history of released versions, see the Changelog orNEWS.md. Tagged releases, starting with 1.2.3 in January 2021, are listed at GitHub.

Copy Link

Version

Install

install.packages('growthcleanr')

Monthly Downloads

363

Version

2.2.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Daymont Carrie

Last Published

February 18th, 2026

Functions in growthcleanr (2.2.1)

tanner_ht_vel

Tanner Growth Velocity Table
splitinput

Split input data into multiple files
weianthro

Weight Anthro Table
recode_sex

Recode binary sex variable for compatibility
read_anthro

Function to calculate z-scores and csd-scores based on anthro tables.
who_ht_maxvel

WHO Maximum Height Velocity for (3\(\sigma\))
who_hc_maxvel

WHO Maximum Head Circumference Velocity for (3\(\sigma\))
who_hc_vel_3sd

WHO Head Circumference Velocity for (3\(\sigma\))
tanner_ht_vel_rev

Tanner Growth Velocity Table for Infants
who_ht_vel_3sd

WHO Height Velocity for (3\(\sigma\))
testacf

Function to test adjust carried forward
test_syngrowth_wide

CDC SAS BMI Input
who_ht_vel_2sd

WHO Height Velocity for (2\(\sigma\))
test_syngrowth_sas_output_compare

CDC SAS BMI Output
tanner_ht_vel_with_2sd

Tanner Growth Velocity Table with (2\(\sigma\))
who_ht_maxvel_2sd

WHO Maximum Height Velocity for (2\(\sigma\))
syngrowth

syngrowth
growth_cdc_ext

CDC Growth Percentile Table
fentlms_foraga

Fenton Growth Curves
fentlms_forz

Fenton Growth Curve Z-Scores
CDCref_d

CDC BMI reference data
bmianthro

BMI Anthro
ext_bmiz

Calculate extended BMI measures
cleangrowth

Clean growth measurements
ewma

Exponentially Weighted Moving Average (EWMA)
adjustcarryforward

adjustcarryforward adjustcarryforward Uses absolute height velocity to identify values excluded as carried forward values for reinclusion.
acf_answers

Answers for adjustcarryforward
longwide

Transform data in growthcleanr format into wide structure for BMI calculation
lenanthro

Length to Age Table
nhanes-reference-medians

NHANES reference medians
rc-reference-medians

Infants reference medians
growth_cdc_ext_infants

CDC Growth Percentile Table for Infants
sd_median

Calculate median SD score by age for each parameter.
growth_who_ext

WHO Growth Percentile Table
simple_bmi

Compute BMI using standard formula