Pre-process data for a `pk` object
# S3 method for pk
do_preprocess(obj, ...)
The same `pk` object, with added elements `data` (containing the cleaned, gap-filled data) and `data_info` (containing summary information about the data, e.g. number of observations by route, media, detect/nondetect; empirical tmax, time of peak concentration for oral data; number of observations before and after empirical tmax)
A `pk` object
Additional arguments. Not in use currently.
John Wambaugh, Caroline Ring, Christopher Cook, Gilberto Padilla Mercado
Data pre-processing for an object `obj` includes the following steps, in order:
Coerce data to class `data.frame` (if it is not already)
Rename variables to harmonized "`invivopkfit` aesthetic" variable names, using `obj$mapping`
Check that the data includes only routes in `obj$pk_settings$preprocess$routes_keep` and media in `obj$pk_settings$preprocess$media_keep`
Check that the data includes only one unit for concentration, one unit for time, and one unit for dose.
Coerce `Value`, `Value_SD`, `LOQ`, `Dose`, and `Time` to numeric, if they are not already.
Coerce `Species`, `Route`, and `Media` to lowercase.
Replace any negative `Value`, `Value_SD`, `Dose`, or `Time` with `NA`
If any non-NA `Value` is currently less than its non-NA LOQ, then replace it with NA
Impute any NA `LOQ`: as `calc_loq_factor` * minimum non-NA `Value` in each `loq_group`
For any cases where `N_Subject`s is NA, impute `N_Subjects` = 1
For anything with `N_Subjects` == 1, set `Value_SD` to 0
Impute missing `Value_SD` as follows: For observations with `N_Subjects` > 1, take the minimum non-issing `Value_SD` for each `sd_group`. If all SDs are missing in an `sd_group`, then `Value_SD` for each observation in that group will be imputed as 0.
Mark data for exclusion according to the following criteria:
Exclude any remaining observations where both Value and LOQ are NA
For any cases where `N_Subjects` is NA, impute `N_Subjects` = 1
Exclude any remaining observations with `N_Subjects` > 1 and `Value_SD` still NA. (This should never occur, if SD imputation is performed, but just in case.)
Exclude any observations with `N_Subjects` > 1 where reported `Value` is NA, because log-likelihood for non-detect multi-subject observations has not been implemented.
Exclude any observations with NA `Time` values
Exclude any observations with `Dose` = 0
Apply any time transformations specified by user
Scale concentration by `ratio_conc_dose`
Apply any concentration transformations specified by the user.
If `Series_ID` is not included, then assign it as NA
Create variable `pLOQ` and set it equal to `LOQ`