Learn R Programming

lab2clean (version 2.0.0)

standardize_lab_unit: Clean and Standardize Formats of Laboratory Units of Measurement

Description

This function is designed to clean and standardize formats of laboratory units of measurement. It standardizes the units' format according to the Unified Code for Units of Measure (UCUM) https://ucum.org/ucum

Usage

standardize_lab_unit(lab_data, raw_unit, report = TRUE, n_records = NA)

Value

A modified `lab_data` data frame with additional columns: * `ucum_code`: Cleaned and standardized units according to UCUM syntax. * `cleaning_comments`: Comments about the cleaning process for each unit.

Arguments

lab_data

A data frame containing laboratory data.

raw_unit

The column in `lab_data` that contains raw units to be cleaned.

report

A report is written in the console. Defaults to "TRUE".

n_records

In case you are loading a grouped list of distinct results, then you can assign the n_records to the column that contains the frequency of each distinct result. Defaults to NA.

Author

Ahmed Zayed <ahmed.zayed@kuleuven.be>, Ilias Sarikakis <sarikakisilias@gmail.com>

Details

The function undergoes the following methodology: 1. Pre-processing unit srings. 2. Lookup in commom units database. 3. Check Syntax Integrity of units with no UCUM match. 4. Parsing of units which passesd checks (tokenize and classify) 5. Restructuring of parsed units (apply correction rules & final validation)

Internal Datasets: The function uses an internal dataset; `RWD_units_to_UCUM_V2` which contains 3739 synonyms of 1448 ucum units.

See Also

Function 1 for result value cleaning, Function 2 for result validation, Function 3 for unit format cleaning, Function 4 for unit conversion.