Learn R Programming

⚠️There's a newer version (0.6.0) of this package.Take me there.

EDCimport

EDCimport is a package designed to easily import data from EDC software TrialMaster.

Installation

# Install last version available on CRAN (once published)
install.packages("EDCimport")

# Install development version on Github
devtools::install_github("DanChaltiel/EDCimport")

You will also need 7-zip installed, and preferably added to the PATH.

[!WARNING] This package was developed to work on Windows and is unlikely to work on any other OS. You are very welcome to submit a PR if you manage to get it to work on Mac or Linux.

Load the data

Inside TrialMaster, you should request an export of type SAS Xport, with the checkbox "Include Codelists" ticked. This export should generate a .zip archive.

Then, simply use read_trialmaster() with the archive password (if any) to retrieve the data from the archive:

library(EDCimport)
tm = read_trialmaster("path/to/my/archive.zip", pw="foobar")

The resulting object tm is a list containing all the datasets, plus metadatas.

You can now use load_list() to import the list in the global environment and use your tables:

load_list(tm) #this also removes `tm` to save memory
mean(dataset1$column5)

There are many other options available (e.g. colnames cleaning & table splitting), see ?read_trialmaster for more details.

Database management tools

EDCimport include a set of useful tools that help with using the imported database. See References for a complete list.

Database summary

Reading a database using read_trialmaster() generates the .lookup dataframe, which contains for each dataset the number of rows, columns, patients, and the CRF name.

.lookup is used by many other tools inside EDCimport, be careful not to modify or delete it.

Search the whole database

Using find_keyword(), you can run a global search of the database.

For instance, say you do not remember in which dataset and column is located the "date of ECG". find_keyword() will search every column name and label and will give you the answer:

find_keyword("date")
#> # A tibble: 10 x 3
#>    dataset names   labels                      
#>    <chr>   <chr>   <chr>                       
#>  1 pat     PTRNDT  Randomization Date          
#>  2 pat     RGSTDT  Registration Date           
#>  3 site    INVDAT  Deactivation date           
#>  4 site    TRGTDT  Target Enroll Date          
#>  5 trial   TRSPDT  End Date                    
#>  6 trial   TRSTDT  Start Date                  
#>  7 visit   VISIT2  Visit Date                  
#>  8 visit   EEXPVDT Earliest Expected Visit Date
#>  9 vs      ECGDAT  Date of ECG                 
#> 10 vs      VISITDT Visit Date

Swimmer Plot

The edc_swimmerplot() function will create a swimmer plot of all date variables in the whole database.

There are 2 arguments of interest:

  • group, a grouping variable (e.g. the treatment arm)

  • origin, a date variable acting as the time zero (e.g. the date of enrollment)

edc_swimmerplot()
edc_swimmerplot(group="enrolres$arm")
edc_swimmerplot(origin="enrolres$enroldt")

This outputs a plotly interactive graph where you can select the dates of interest and zoom in with your mouse.

Note that any modification made after running read_trialmaster() is taken into account. For instance, mutating a column with as.Date() in one of the tables will add a new group in the plot.

Copy Link

Version

Install

install.packages('EDCimport')

Monthly Downloads

359

Version

0.5.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Dan Chaltiel

Last Published

November 14th, 2024

Functions in EDCimport (0.5.2)

extend_lookup

Extend the lookup table
manual_correction

Manual correction
edc_peek_options

See which EDCimport option is currently set.
edc_warn_extraction_date

Warn if extraction is too old
edc_warn_patient_diffs

Check the validity of the subject ID column
read_all_csv

Read all .csv files in a directory
read_trialmaster

Read the .zip archive of a TrialMaster export
find_keyword

Find a keyword in the whole database
reexports

Objects exported from other packages
load_list

Load a list in an environment
get_common_cols

Get columns that are common to multiple datasets
load_as_list

Load a .RData file as a list
split_mixed_datasets

Split mixed datasets
select_distinct

Select only distinct columns
get_meta_cols

Get columns shared by most datasets
search_for_newer_data

Search for newer data
get_subjid_cols

Get key column names
save_list

Save a list as .RData file
table_format

Identify if a dataframe has a long or a wide format
save_sessioninfo

Save sessionInfo() output
read_all_xpt

Read all .xpt files in a directory
save_plotly

Save a plotly to an HTML file
harmonize_subjid

Harmonize the subject ID of the database
get_datasets

Retrieve the datasets as a list of data.frames
get_key_cols

Important column names
lastnews_table

Get a table with the latest date for each patient
unify

Unify a vector
read_all_sas

Read all .sas7bdat files in a directory
edc_options

Set global options for EDCimport
edc_reset_options

Reset all EDCimport options.
edc_swimmerplot

Swimmer plot of all dates columns
fct_yesno

Format factor levels as Yes/No
assert_no_duplicate

Assert that a dataframe has one row per patient
data_example

Example databases
edc_data_warn

Standardized warning system
edc_db_to_excel

Save the database as an Excel file
build_lookup

Generate a lookup table
edc_inform_code

Shows how many code you wrote
EDCimport-package

EDCimport: Import Data from EDC Software
crf_status_plot

Show the current CRF status distribution
edc_lookup

Retrieve the lookup table from options
edc_population_plot

Plot the populations