Learn R Programming

eHDPrep

The goal of ‘eHDPrep’ is to provide robust quality control and semantic enrichment tools for preparation of health datasets. High-level and low-level functionality are included for general and specialist R users, respectively. A detailed vignette can be accessed using: vignette("Introduction_to_eHDPrep", package = "eHDPrep").

Installation

You can install the released version of ‘eHDPrep’ from CRAN with:

install.packages("eHDPrep")

Or from GitHub with:

install.packages("devtools")
devtools::install_github("overton-group/eHDPrep")

Example

‘eHDPrep’ can prepare health data for analysis with several approaches. For example, standardising strings representing missing values to NA:

library(eHDPrep)
data("example_data")

# original values in t_stage variable
unique(example_data$t_stage)
#> [1] "T3a"       "T3b"       "T1"        "T4"        "T2"        "equivocal"

# predefined value "equivocal" is removed
unique(eHDPrep::strings_to_NA(example_data, strings_to_replace = "equivocal")$t_stage)
#> [1] "T3a" "T3b" "T1"  "T4"  "T2"  NA

System Requirements

Some additional system dependencies are required for ‘eHDPrep’. These are detailed below:

Ubuntu

remotes::system_requirements(".", os = "ubuntu", os_release = "18.04")
#>  [1] "apt-get install -y libicu-dev"          
#>  [2] "apt-get install -y pandoc"              
#>  [3] "apt-get install -y make"                
#>  [4] "apt-get install -y libcurl4-openssl-dev"
#>  [5] "apt-get install -y libssl-dev"          
#>  [6] "apt-get install -y libxml2-dev"         
#>  [7] "apt-get install -y libfontconfig1-dev"  
#>  [8] "apt-get install -y libfreetype6-dev"    
#>  [9] "apt-get install -y libpng-dev"          
#> [10] "apt-get install -y imagemagick"         
#> [11] "apt-get install -y libmagick++-dev"     
#> [12] "apt-get install -y gsfonts"             
#> [13] "apt-get install -y libglpk-dev"         
#> [14] "apt-get install -y libgmp3-dev"
remotes::system_requirements(".", os = "ubuntu", os_release = "20.04")
#>  [1] "apt-get install -y libicu-dev"          
#>  [2] "apt-get install -y pandoc"              
#>  [3] "apt-get install -y make"                
#>  [4] "apt-get install -y libcurl4-openssl-dev"
#>  [5] "apt-get install -y libssl-dev"          
#>  [6] "apt-get install -y libxml2-dev"         
#>  [7] "apt-get install -y libfontconfig1-dev"  
#>  [8] "apt-get install -y libfreetype6-dev"    
#>  [9] "apt-get install -y libpng-dev"          
#> [10] "apt-get install -y imagemagick"         
#> [11] "apt-get install -y libmagick++-dev"     
#> [12] "apt-get install -y gsfonts"             
#> [13] "apt-get install -y libglpk-dev"         
#> [14] "apt-get install -y libgmp3-dev"

Opensuse

remotes::system_requirements(".", os = "opensuse", os_release = "42.3")
#>  [1] "zypper install -y libicu-devel"         
#>  [2] "zypper install -y pandoc"               
#>  [3] "zypper install -y make"                 
#>  [4] "zypper install -y libcurl-devel"        
#>  [5] "zypper install -y libopenssl-devel"     
#>  [6] "zypper install -y libxml2-devel"        
#>  [7] "zypper install -y fontconfig-devel"     
#>  [8] "zypper install -y freetype2-devel"      
#>  [9] "zypper install -y libpng16-compat-devel"
#> [10] "zypper install -y ImageMagick"          
#> [11] "zypper install -y ImageMagick-devel"    
#> [12] "zypper install -y libMagick++-devel"    
#> [13] "zypper install -y gmp-devel"

CentOS

remotes::system_requirements(".", os = "centos", os_release = "7")
#>  [1] "yum install -y epel-release"         
#>  [2] "yum install -y libicu-devel"         
#>  [3] "yum install -y pandoc"               
#>  [4] "yum install -y make"                 
#>  [5] "yum install -y libcurl-devel"        
#>  [6] "yum install -y openssl-devel"        
#>  [7] "yum install -y libxml2-devel"        
#>  [8] "yum install -y fontconfig-devel"     
#>  [9] "yum install -y freetype-devel"       
#> [10] "yum install -y libpng-devel"         
#> [11] "yum install -y ImageMagick"          
#> [12] "yum install -y ImageMagick-c++-devel"
#> [13] "yum install -y glpk-devel"           
#> [14] "yum install -y gmp-devel"
remotes::system_requirements(".", os = "centos", os_release = "8")
#>  [1] "dnf install -y dnf-plugins-core"            
#>  [2] "dnf config-manager --set-enabled powertools"
#>  [3] "dnf install -y epel-release"                
#>  [4] "dnf install -y libicu-devel"                
#>  [5] "dnf install -y pandoc"                      
#>  [6] "dnf install -y make"                        
#>  [7] "dnf install -y libcurl-devel"               
#>  [8] "dnf install -y openssl-devel"               
#>  [9] "dnf install -y libxml2-devel"               
#> [10] "dnf install -y fontconfig-devel"            
#> [11] "dnf install -y freetype-devel"              
#> [12] "dnf install -y libpng-devel"                
#> [13] "dnf install -y ImageMagick"                 
#> [14] "dnf install -y ImageMagick-c++-devel"       
#> [15] "dnf install -y glpk-devel"                  
#> [16] "dnf install -y gmp-devel"

Red Hat

remotes::system_requirements(".", os = "redhat", os_release = "7")
#>  [1] "rpm -q epel-release || yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm"
#>  [2] "yum install -y libicu-devel"                                                                                 
#>  [3] "yum install -y pandoc"                                                                                       
#>  [4] "yum install -y make"                                                                                         
#>  [5] "yum install -y libcurl-devel"                                                                                
#>  [6] "yum install -y openssl-devel"                                                                                
#>  [7] "yum install -y libxml2-devel"                                                                                
#>  [8] "yum install -y fontconfig-devel"                                                                             
#>  [9] "yum install -y freetype-devel"                                                                               
#> [10] "yum install -y libpng-devel"                                                                                 
#> [11] "yum install -y ImageMagick"                                                                                  
#> [12] "yum install -y ImageMagick-c++"                                                                              
#> [13] "yum install -y glpk-devel"                                                                                   
#> [14] "yum install -y gmp-devel"
remotes::system_requirements(".", os = "redhat", os_release = "8")
#>  [1] "dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm"
#>  [2] "dnf install -y libicu-devel"                                                          
#>  [3] "dnf install -y make"                                                                  
#>  [4] "dnf install -y libcurl-devel"                                                         
#>  [5] "dnf install -y openssl-devel"                                                         
#>  [6] "dnf install -y libxml2-devel"                                                         
#>  [7] "dnf install -y fontconfig-devel"                                                      
#>  [8] "dnf install -y freetype-devel"                                                        
#>  [9] "dnf install -y libpng-devel"                                                          
#> [10] "dnf install -y ImageMagick"                                                           
#> [11] "dnf install -y ImageMagick-c++"                                                       
#> [12] "dnf install -y glpk-devel"                                                            
#> [13] "dnf install -y gmp-devel"

Copy Link

Version

Install

install.packages('eHDPrep')

Monthly Downloads

205

Version

1.3.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Ian Overton

Last Published

September 3rd, 2025

Functions in eHDPrep (1.3.4)

encode_cats

Encode categorical variables using one-hot encoding.
discrete.mi

Calculate mutual information of a matrix of discrete values
encode_as_num_mat

Convert data frame to numeric matrix
edge_tbl_to_graph

Convert edge table to tidygraph graph
encode_genotype_vec

Encode a genotype/SNP vector
extract_freetext

Extract information from free text
example_mapping_file

Example mapping file for semantic enrichment
entropy

Calculate Entropy of a Vector
encode_ordinals

Encode ordinal variables
encode_genotypes

Encode genotype/SNP variables in data frame
example_edge_tbl

Example ontology as an edge table for semantic enrichment
metavariable_agg

Aggregate Data by Metavariable
export_dataset

Export data to delimited file
merge_cols

Merge columns in data frame
example_ontology

Example ontology as a network graph for semantic enrichment
information_content_discrete

Calculate Information Content (Discrete Variable)
join_vars_to_ontol

Join Mapping Table to Ontology Network Graph
geometric.mean

Geometric mean
identify_inconsistency

Identify inconsistencies in a dataset
mean_catchNAs

Find mean of vector safely
max_catchNAs

Find maximum of vector safely
exact.kde

Exact kernel density estimation
example_data

Example data for eHDPrep
mi_content_discrete

Calculate Mutual Information Content
skipgram_freq

Report Skipgram Frequency
skipgram_append

Append Skipgram Presence Variables to Dataset
min_catchNAs

Find minimum of vector safely
strings_to_NA

Replace values in non-numeric columns with NA
skipgram_identify

Identify Neighbouring Words (Skipgrams) in a free-text vector
sum_catchNAs

Sum vector safely for semantic enrichment
validate_consistency_tbl

Validate internal consistency table
mod_track

Data modification tracking
zero_entropy_variables

Identify variables with zero entropy
onehot_vec

One hot encode a vector
ordinal_label_levels

Extract labels and levels of ordinal variables in a dataset
row_completeness

Calculate Row Completeness in a Data Frame
semantic_enrichment

Semantic enrichment
nums_to_NA

Replace numeric values in numeric columns with NA
normalize

Min max normalization
import_dataset

Import data into 'R'
import_var_classes

Import corrected variable classes
node_IC_zhou

Calculate Node Information Content (Zhou et al 2008 method)
information_content_contin

Calculate Information Content (Continuous Variable)
plot_completeness

Plot Completeness of a Dataset
metavariable_info

Compute Metavariable Information
metavariable_variable_descendants

Extract metavariables' descendant variables
prod_catchNAs

Find product of vector safely
validate_mapping_tbl

Validate mapping table for semantic enrichment
variable.bw.kde

Variable bandwidth Kernel Density Estimation
variable_completeness

Calculate Variable Completeness in a Data Frame
validate_ontol_nw

Validate ontology network for semantic enrichment
report_var_mods

Track changes to dataset variables
review_quality_ctrl

Review Quality Control
warn_missing_dots

Missing dots warning
variable_entropy

Calculate Entropy of Each Variable in Data Frame
cellspec_lgl

Kable logical data highlighting
compare_info_content

Information Content Comparison Table
compare_info_content_plt

Information Content Comparison Plot
assess_quality

Assess quality of a dataset
completeness_heatmap

Completeness Heatmap
apply_quality_ctrl

Apply quality control measures to a dataset
assume_var_classes

Assume variable classes in data
compare_completeness

Compare Completeness between Datasets
assess_completeness

Assess completeness of a dataset
count_compare

Compare unique values before and after data modification
encode_binary_cats

Encode categorical variables as binary factors
encode_bin_cat_vec

Encode a categorical vector with binary categories
distant_neg_val

Find highly distant value for data frame
eHDPrep-package

'eHDPrep': Quality Control and Semantic Enrichment of Datasets