This function is designed to clean and standardize formats of laboratory units of measurement. It standardizes the units' format according to the Unified Code for Units of Measure (UCUM) https://ucum.org/ucum
standardize_lab_unit(lab_data, raw_unit, report = TRUE, n_records = NA)A modified `lab_data` data frame with additional columns: * `ucum_code`: Cleaned and standardized units according to UCUM syntax. * `cleaning_comments`: Comments about the cleaning process for each unit.
A data frame containing laboratory data.
The column in `lab_data` that contains raw units to be cleaned.
A report is written in the console. Defaults to "TRUE".
In case you are loading a grouped list of distinct results, then you can assign the n_records to the column that contains the frequency of each distinct result. Defaults to NA.
Ahmed Zayed <ahmed.zayed@kuleuven.be>, Ilias Sarikakis <sarikakisilias@gmail.com>
The function undergoes the following methodology: 1. Pre-processing unit srings. 2. Lookup in commom units database. 3. Check Syntax Integrity of units with no UCUM match. 4. Parsing of units which passesd checks (tokenize and classify) 5. Restructuring of parsed units (apply correction rules & final validation)
Internal Datasets: The function uses an internal dataset; `RWD_units_to_UCUM_V2` which contains 3739 synonyms of 1448 ucum units.
Function 1 for result value cleaning, Function 2 for result validation, Function 3 for unit format cleaning, Function 4 for unit conversion.