Learn R Programming

⚠️There's a newer version (2.1.0) of this package.Take me there.

dataquieR

The goal of dataquieR is to provide functions for assessing data quality issues in studies, that can be used alone or in a data quality pipeline. dataquieR also implements one generic pipeline producing flexdashboard based HTML5 reports.

See also

https://dataquality.ship-med.uni-greifswald.de


Installation

You can install the released version of dataquieR from CRAN with:

install.packages("dataquieR")

The developer version from GitLab.com can be installed using:

if (!requireNamespace("devtools")) {
  install.packages("devtools")
}
devtools::install_gitlab("libreumg/dataquier")

For examples and additional documentation, please refer to our website.

References

Funding

  • German Research Foundation (DFG: SCHM 2744/3–1)
  • European Union’s Horizon 2020 research and innovation program (grant agreement No 825903.

Copy Link

Version

Install

install.packages('dataquieR')

Monthly Downloads

708

Version

1.0.8

License

BSD_2_clause + file LICENSE

Maintainer

Stephan Struckmann

Last Published

August 12th, 2021

Functions in dataquieR (1.0.8)

as.data.frame.dataquieR_resultset

Convert a full dataquieR report to a data.frame
as.list.dataquieR_resultset

Convert a full dataquieR report to a list
VARIABLE_ROLES

Variable roles can be one of the following:
com_item_missingness

Summarize missingness columnwise (in variable)
acc_margins

com_segment_missingness

Summarizes missingness for individuals in specific segments
acc_loess

Smoothes and plots adjusted longitudinal measurements
acc_varcomp

Estimates variance components
acc_univariate_outlier

Function to identify univariate outliers by four different approaches
dataquieR_resultset

dataquieR_resultset_verify

Verify an object of class dataquieR_resultset
acc_robust_univariate_outlier

Function to identify univariate outliers by four different approaches
con_limit_deviations

Detects variable values exceeding limits defined in metadata
acc_multivariate_outlier

Function to calculate and plot Mahalanobis distances
contradiction_functions

contradiction_functions
com_unit_missingness

Counts all individuals with no measurements at all
acc_shape_or_scale

Function to compare observed versus expected distributions
pipeline_vectorized

Call (nearly) one "Accuracy" function with many parameterizations at once automatically
dataquieR

The dataquieR package about Data Quality in Epidemiological Research
contradiction_functions_descriptions

description of the contradiction functions
prep_add_to_meta

Support function to augment metadata during data quality reporting
prep_min_obs_level

Support function to identify the levels of a process variable with minimum number of observations
prep_map_labels

Support function to allocate labels to variables
util_app_dc

utility function for the applicability of of distribution plots
dq_report_by

Generate a stratified full DQ report
prep_study2meta

Guess a meta data frame from study data.
dq_report

Generate a full DQ report
util_app_dl

utility function to test for applicability of detection limits checks
util_count_NA

Support function to count number of NAs
con_contradictions

Checks user-defined contradictions in study data
con_detection_limits

con_detection_limits
prep_valuelabels_from_data

Get value labels from data
util_count_code_classes

count distinct realizations of missing codes of a specific class
util_heatmap_1th

Utility Function Heatmap with 1 Threshold
util_looks_like_missing

Check for repetitive values using the digits 8 or 9 only
util_get_var_att_names_of_level

Get variable attributes of a certain provision level
util_is_integer

Check for integer values
util_set_size

Attaches attributes about the recommended minimum absolute sizes to the plot p
util_sigmagap

Utility function outliers according to the rule of Huber et al.
prep_datatype_from_data

Get data types from data
prep_create_meta

util_app_iac

utility function for the applicability of categorical admissibility
util_app_im

utility function applicability of item missingness
util_app_ed

utility function for the applicability of of end digits preferences checks
util_app_iav

utility function for the applicability of numeric admissibility
util_anytime_installed

Test, if package anytime is installed
util_app_cd

utility function for the applicability of contradiction checks
dimensions

Names of DQ dimensions
util_assign_levlabs

utility function to assign labels to levels
util_app_mar

utility function to test for applicability of marginal means plots
util_app_loess

utility function for applicability of LOESS smoothed time course plots
util_correct_variable_use

Check referred variables
util_hubert

utility function for the outliers rule of Huber et al.
util_compare_meta_with_study

Compares study data data types with the ones expected according to the metadata
util_interpret_limits

Utility function to interpret mathematical interval notation
util_set_dQuoteString

Utility function to put strings in quotes
.variable_arg_roles

Variable-argument roles
util_app_vc

utility applicability variance components
util_backtickQuote

utility function to set string in backticks
util_as_numeric

Convert factors to label-corresponding numeric values
con_inadmissible_categorical

Detects variable levels not specified in metadata
util_ensure_suggested

Support function to stop, if an optional package is not installed
pipeline_recursive_result

Function to convert a pipeline result data frame to named encapsulated lists
int_datatype_matrix

Function to check declared data types of metadata in study data
prep_pmap

Support function for a parallel pmap
util_set_sQuoteString

Utility function single quote string
util_error

Produce an error message with a useful short stack trace. Then it stops the execution.
prep_check_meta_names

Checks the validity of meta data w.r.t. the provided column names
prep_clean_labels

Support function to scan variable labels for applicability
util_check_data_type

Support function to verify the data type of a value
prep_prepare_dataframes

Prepare and verify study data with metadata
util_map_labels

Support function to allocate labels to variables
print.ReportSummaryTable

print implementation for the class ReportSummaryTable
util_no_value_labels

util_check_one_unique_value

Check for one value only
rbind.ReportSummaryTable

Combine ReportSummaryTable outputs
summary.dataquieR_resultset

util_app_sos

utility function applicability of distribution function's shape or scale check
util_app_sm

utility function applicability of segment missingness
util_count_codes

count realizations of missing codes of any class
pro_applicability_matrix

Function to check applicability of DQ functions on study data
print.dataquieR_resultset

print.dataquieR_result

Print a dataquieR result returned by pipeline_vectorized
util_find_external_functions_in_stacktrace

Find externally called function in the stack trace
util_find_first_externally_called_functions_in_stacktrace

Find first externally called function in the stack trace
util_make_function

Make a function capturing errors and other conditions for parallelization
util_par_pmap

Utility function parallel version of purrr::pmap
util_warning

Produce a warning message with a useful short stack trace.
util_map_all

Maps label column meta data on study data variable names
util_parse_assignments

Utility function to parse assignments
util_dichotomize

utility function to dichotomize variables
util_fix_rstudio_bugs

RStudio crashes on parallel calls in some versions on Darwin based operating systems with R 4
util_app_mol

utility function applicability of multivariate outlier detection
util_app_ol

utility function for the applicability of outlier detection
util_get_code_list

Fetch a missing code list from the metadata
util_prepare_dataframes

util_prepare_dataframes
util_validate_known_meta

Utility function verifying syntax of known metadata columns
util_replace_codes_by_NA

Utility function to replace missing codes by NAs
util_warn_unordered

Warn about a problem in varname, if x has no natural order
util_dist_selection

Utility function distribution-selection
util_empty

Test, if values of x are empty, i.e. NA or whitespace characters
util_tukey

Utility function Tukey outlier rule
util_only_NAs

identify NA-only variables
util_sixsigma

Utility function for six sigma deviations rule
util_observations_in_subgroups

Utility function observations in subgroups
acc_end_digits

Extension of acc_shape_or_scale to examine uniform distributions of end digits
VARATT_REQUIRE_LEVELS

Requirement levels of certain metadata columns
DISTRIBUTIONS

acc_distributions

Function to plot histograms added by empirical cumulative distributions for subgroups
SPLIT_CHAR

Character used by default as a separator in meta data such as missing codes
DATA_TYPES_OF_R_TYPE

All available data types, mapped from their respective R types
DATA_TYPES

Data Types
WELL_KNOWN_META_VARIABLE_NAMES

Well-known metadata column names, names of metadata columns