This function is designed to clean and standardize laboratory result values. It creates two new columns "clean_result" and "scale_type" without altering the original result values. The function is part of a comprehensive R package designed for cleaning laboratory datasets.
clean_lab_result(
lab_data,
raw_result,
locale = "NO",
report = TRUE,
n_records = NA
)A modified `lab_data` data frame with additional columns: * `clean_result`: Cleaned and standardized result values. * `scale_type`: The scale type of result values (Quantitative, Ordinal, Nominal). * `cleaning_comments`: Comments about the cleaning process for each record.
A data frame containing laboratory data.
The column in `lab_data` that contains raw result values to be cleaned.
A string representing the locale for the laboratory data. Defaults to "NO".
A report is written in the console. Defaults to "TRUE".
In case you are loading a grouped list of distinct results, then you can assign the n_records to the column that contains the frequency of each distinct result. Defaults to NA.
Ahmed Zayed <ahmed.zayed@kuleuven.be>
The function undergoes the following methodology: 1. Clear Typos: Removes typographical errors and extraneous characters. 2. Handle Extra Variables: Identifies and separates extra variables from result values. 3. Detect and Assign Scale Types: Identifies and assigns the scale type using regular expressions. 4. Number Formatting: Standardizes number formats based on predefined rules and locale. 5. Mining Text Results: Identifies common words and patterns in text results.
Internal Datasets: The function uses an internal dataset; `common_words_languages.csv` which contains common words in various languages used for pattern identification in text result values.
Function 2 for result validation,