For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.
con_inadmissible_categorical(
resp_vars = NULL,
study_data,
meta_data,
label_col,
threshold = NULL
)
variable list the name of the measurement variables
data.frame the data frame that contains the measurements
data.frame the data frame that contains metadata attributes of study data
variable attribute the name of the column in the metadata with labels of variables
numeric from=0 to=100. a numerical value ranging from 0-100. Not yet implemented.
a list with:
SummaryTable
: data frame summarizing inadmissible categories with the
columns:
Variables
: variable name/label
OBSERVED_CATEGORIES
: the categories observed in the study data
DEFINED_CATEGORIES
: the categories defined in the metadata
NON_MATCHING
: the categories observed but not defined
NON_MATCHING_N
: the number of observations with categories not defined
GRADING
: indicator TRUE/FALSE if inadmissible categorical values were
observed
ModifiedStudyData
: study data having inadmissible categories removed
FlaggedStudyData
: study data having cases with inadmissible categories
flagged
Remove missing codes from the study data (if defined in the metadata)
Interpretation of variable specific VALUE_LABELS as supplied in the metadata.
Identification of measurements not corresponding to the expected categories. Therefore two output data frames are generated:
on the level of observation to flag each undefined category, and
a summary table for each variable.
Values not corresponding to defined categories are removed in a data frame of modified study data