Learn R Programming

eHDPrep (version 1.3.3)

Quality Control and Semantic Enrichment of Datasets

Description

A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) ). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.

Copy Link

Version

Install

install.packages('eHDPrep')

Monthly Downloads

247

Version

1.3.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Ian Overton

Last Published

June 5th, 2023

Functions in eHDPrep (1.3.3)

edge_tbl_to_graph

Convert edge table to tidygraph graph
encode_genotype_vec

Encode a genotype/SNP vector
entropy

Calculate Entropy of a Vector
distant_neg_val

Find highly distant value for data frame
encode_ordinals

Encode ordinal variables
discrete.mi

Calculate mutual information of a matrix of discrete values
example_edge_tbl

Example ontology as an edge table for semantic enrichment
encode_genotypes

Encode genotype/SNP variables in data frame
example_mapping_file

Example mapping file for semantic enrichment
encode_cats

Encode categorical variables using one-hot encoding.
export_dataset

Export data to delimited file
example_ontology

Example ontology as a network graph for semantic enrichment
extract_freetext

Extract information from free text
geometric.mean

Geometric mean
exact.kde

Exact kernel density estimation
import_var_classes

Import corrected variable classes
information_content_contin

Calculate Information Content (Continuous Variable)
example_data

Example data for eHDPrep
join_vars_to_ontol

Join Mapping Table to Ontology Network Graph
normalize

Min max normalization
nums_to_NA

Replace numeric values in numeric columns with NA
identify_inconsistency

Identify inconsistencies in a dataset
metavariable_agg

Aggregate Data by Metavariable
import_dataset

Import data into 'R'
mi_content_discrete

Calculate Mutual Information Content
information_content_discrete

Calculate Information Content (Discrete Variable)
merge_cols

Merge columns in data frame
min_catchNAs

Find minimum of vector safely
metavariable_variable_descendants

Extract metavariables' descendant variables
metavariable_info

Compute Metavariable Information
plot_completeness

Plot Completeness of a Dataset
onehot_vec

One hot encode a vector
max_catchNAs

Find maximum of vector safely
prod_catchNAs

Find product of vector safely
mean_catchNAs

Find mean of vector safely
ordinal_label_levels

Extract labels and levels of ordinal variables in a dataset
variable_entropy

Calculate Entropy of Each Variable in Data Frame
strings_to_NA

Replace values in non-numeric columns with NA
warn_missing_dots

Missing dots warning
skipgram_identify

Identify Neighbouring Words (Skipgrams) in a free-text vector
mod_track

Data modification tracking
row_completeness

Calculate Row Completeness in a Data Frame
node_IC_zhou

Calculate Node Information Content (Zhou et al 2008 method)
zero_entropy_variables

Identify variables with zero entropy
validate_mapping_tbl

Validate mapping table for semantic enrichment
validate_ontol_nw

Validate ontology network for semantic enrichment
sum_catchNAs

Sum vector safely for semantic enrichment
semantic_enrichment

Semantic enrichment
validate_consistency_tbl

Validate internal consistency table
skipgram_freq

Report Skipgram Frequency
skipgram_append

Append Skipgram Presence Variables to Dataset
variable.bw.kde

Variable bandwidth Kernel Density Estimation
variable_completeness

Calculate Variable Completeness in a Data Frame
review_quality_ctrl

Review Quality Control
report_var_mods

Track changes to dataset variables
assess_quality

Assess quality of a dataset
assume_var_classes

Assume variable classes in data
apply_quality_ctrl

Apply quality control measures to a dataset
compare_info_content_plt

Information Content Comparison Plot
encode_bin_cat_vec

Encode a categorical vector with binary categories
compare_info_content

Information Content Comparison Table
encode_as_num_mat

Convert data frame to numeric matrix
compare_completeness

Compare Completeness between Datasets
cellspec_lgl

Kable logical data highlighting
encode_binary_cats

Encode categorical variables as binary factors
assess_completeness

Assess completeness of a dataset
count_compare

Compare unique values before and after data modification
completeness_heatmap

Completeness Heatmap
eHDPrep-package

'eHDPrep': Quality Control and Semantic Enrichment of Datasets