Learn R Programming

dccvalidator

dccvalidator is a package and Shiny app to perform data validation and QA/QC. It’s used in the AMP-AD and PsychENCODE consortia to validate data prior to data releases.

Installation

You can install dccvalidator from CRAN:

install.packages("dccvalidator")

To install the development version from GitHub, run:

devtools::install_github("Sage-Bionetworks/dccvalidator")

Many functions in dccvalidator use reticulate and the Synapse Python client. See the reticulate documentation for information on how to set R to use a specific version of Python if you don’t want to use the default Python installation on your machine. Whichever Python installation you choose should have synapseclient installed.

Because dccvalidator uses reticulate, it is not compatible with the synapser package..

Check data

dccvalidator provides functions for checking the following common data quality issues:

  • Annotation keys and values conform to a controlled vocabulary
  • Column names match those of an associated metadata template
  • Certain columns are not empty
  • Certain columns are complete
  • Identifiers match between two metadata files (e.g. all individuals in one file are also present in another)
  • Check that identifiers are unique within a file

Data submission validation

This package contains a Shiny app to validate manifests and metadata for AMP-AD studies. It uses the dccvalidator package to check for common data quality issues and gives realtime feedback to the data contributor on errors that need to be fixed. The reporting UI is heavily inspired by the MetaDIG project’s metadata quality reports.

The application also allows users to submit documentation of their study, a description of the methods used, etc.

See the customizing dccvalidator vignette for information on how to spin up a customized version of the application

Copy Link

Version

Install

install.packages('dccvalidator')

Monthly Downloads

13

Version

0.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Nicole Kauer

Last Published

June 19th, 2020

Functions in dccvalidator (0.3.0)

check_pass

Create custom conditions for reporting
check_values

Check a set of keys and their values
with_busy_indicator_ui

Show busy indicator
check_ages_over_90

Check for ages over 90
dccvalidator-package

dccvalidator: Metadata Validation for Data Coordinating Centers
get_synapse_table

Get Synapse table
get_synapse_annotations

Get Synapse annotations
check_duplicate_paths

Check for duplicated file paths
check_files_manifest

Check that files are present in manifest
check_ids_match

Check ids
check_keys

Check that a given set of keys are all present in an annotations dictionary
check_cols_empty

Check for empty columns
get_template

Get a template
report_unsatisfied_requirements

Create a modal dialog if user is not in required team(s) or certified
check_condition

Create a condition of the given type
check_schema_df

Check a data frame of data against a JSON Schema
check_team_membership

Check team membership
check_schema_json

Check data against a JSON Schema
check_indiv_ids_dup

Check uniqueness of individual and specimen IDs
check_parent_syn

Check synID of parent in manifest
df_to_json_list

Convert data frame to JSON
file_summary_ui

UI for the file summary module
results_boxes_ui

UI function for results boxes module
run_app

Run the Shiny application
valid_annotation_keys

Valid annotation keys
valid_annotation_values

Valid annotation values
check_certified_user

Check if user is certified
can_coerce

Check coercibility
check_cols_complete

Check for complete columns
check_col_names

Check column names against their corresponding template
check_annotation_values

Check annotation values
check_annotation_keys

Check annotation keys
check_all

Run all validation checks
app_server

App server
app_ui

App UI