Learn R Programming

⚠️There's a newer version (2.0.0) of this package.Take me there.

CDMConnector

Are you using the tidyverse with an OMOP Common Data Model?

Interact with your CDM in a pipe-friendly way with CDMConnector.

  • Quickly connect to your CDM and start exploring.
  • Build data analysis pipelines using familiar dplyr verbs.
  • Easily extract subsets of CDM data from a database.

Overview

CDMConnector introduces a single R object that represents an OMOP CDM relational database inspired by the dm, DatabaseConnector, and Andromeda packages. The cdm objects encapsulate references to OMOP CDM tables in a remote RDBMS as well as metadata necessary for interacting with a CDM, allowing for dplyr style data analysis pipelines and interactive data exploration.

Features

CDMConnector is meant to be the entry point for composable tidyverse style data analysis operations on an OMOP CDM. A cdm_reference object behaves like a named list of tables.

  • Quickly create a list of references to a subset of CDM tables
  • Store connection information for later use inside functions
  • Use any DBI driver back-end with the OMOP CDM

See Getting started for more details.

Installation

CDMConnector can be installed from CRAN:

install.packages("CDMConnector")

The development version can be installed from GitHub:

# install.packages("devtools")
devtools::install_github("darwin-eu/CDMConnector")

Usage

Create a cdm reference from any DBI connection to a database containing OMOP CDM tables. Use the cdm_schema argument to point to a particular schema in your database that contains your OMOP CDM tables and the write_schema to specify the schema where results tables can be created, and use cdm_name to provide a name for the database.

library(CDMConnector)

con <- DBI::dbConnect(duckdb::duckdb(dbdir = eunomia_dir()))

cdm <- cdm_from_con(con = con, 
                    cdm_schema = "main", 
                    write_schema = "main", 
                    cdm_name = "my_duckdb_database")
## Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
##  target signature 'duckdb_connection#Id'.
##  "duckdb_connection#ANY" would also be valid

A cdm_reference is a named list of table references:

library(dplyr)
names(cdm)
##  [1] "person"                "observation_period"    "visit_occurrence"     
##  [4] "visit_detail"          "condition_occurrence"  "drug_exposure"        
##  [7] "procedure_occurrence"  "device_exposure"       "measurement"          
## [10] "observation"           "death"                 "note"                 
## [13] "note_nlp"              "specimen"              "fact_relationship"    
## [16] "location"              "care_site"             "provider"             
## [19] "payer_plan_period"     "cost"                  "drug_era"             
## [22] "dose_era"              "condition_era"         "metadata"             
## [25] "cdm_source"            "concept"               "vocabulary"           
## [28] "domain"                "concept_class"         "concept_relationship" 
## [31] "relationship"          "concept_synonym"       "concept_ancestor"     
## [34] "source_to_concept_map" "drug_strength"

Use dplyr verbs with the table references.

cdm$person %>% 
  tally()
## # Source:   SQL [1 x 1]
## # Database: DuckDB v1.1.2 [root@Darwin 23.1.0:R 4.3.3//private/var/folders/2j/8z0yfn1j69q8sxjc7vj9yhz40000gp/T/RtmpDw9JTb/fileeea2255bd10b.duckdb]
##       n
##   <dbl>
## 1  2694

Compose operations with the pipe.

cdm$condition_era %>%
  left_join(cdm$concept, by = c("condition_concept_id" = "concept_id")) %>% 
  count(top_conditions = concept_name, sort = TRUE)
## # Source:     SQL [?? x 2]
## # Database:   DuckDB v1.1.2 [root@Darwin 23.1.0:R 4.3.3//private/var/folders/2j/8z0yfn1j69q8sxjc7vj9yhz40000gp/T/RtmpDw9JTb/fileeea2255bd10b.duckdb]
## # Ordered by: desc(n)
##    top_conditions                               n
##    <chr>                                    <dbl>
##  1 Viral sinusitis                          17268
##  2 Acute viral pharyngitis                  10217
##  3 Acute bronchitis                          8184
##  4 Otitis media                              3561
##  5 Osteoarthritis                            2694
##  6 Streptococcal sore throat                 2656
##  7 Sprain of ankle                           1915
##  8 Concussion with no loss of consciousness  1013
##  9 Sinusitis                                 1001
## 10 Acute bacterial sinusitis                  939
## # ℹ more rows

And much more besides. See vignettes for further explanations on how to create database connections, make a cdm reference, and start analysing your data.

Getting help

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.

Citation

## To cite package 'CDMConnector' in publications use:
## 
##   Black A, Gorbachev A, Burn E, Catala Sabate M (????). _CDMConnector:
##   Connect to an OMOP Common Data Model_. R package version 1.6.0,
##   https://github.com/darwin-eu/CDMConnector,
##   <https://darwin-eu.github.io/CDMConnector/>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {CDMConnector: Connect to an OMOP Common Data Model},
##     author = {Adam Black and Artem Gorbachev and Edward Burn and Marti {Catala Sabate}},
##     note = {R package version 1.6.0, https://github.com/darwin-eu/CDMConnector},
##     url = {https://darwin-eu.github.io/CDMConnector/},
##   }

License: Apache 2.0

Copy Link

Version

Install

install.packages('CDMConnector')

Monthly Downloads

1,178

Version

1.7.0

License

Apache License (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Adam Black

Last Published

December 19th, 2024

Functions in CDMConnector (1.7.0)

cohortUnion

Union all cohorts in a cohort set with cohorts in a second cohort set
computeQuery

Execute dplyr query and save result in remote database
cdmSample

Subset a cdm object to a random sample of individuals
datediff

Compute the difference between two days
dateadd

Add days or years to a date in a dplyr query
exampleDatasets

List the available example CDM datasets
generateCohortSet

Generate a cohort set on a cdm object
cohortErafy

Collapse cohort records within a certain number of days
cohortSet

Get cohort settings from a cohort_table object
inSchema

Helper for working with compound schemas
generateConceptCohortSet

Create a new generated cohort set from a list of concept sets
%>%

Pipe operator
listTables

List tables in a schema
cdmSelectTbl

Select a subset of tables in a cdm reference object
intersectCohorts

Intersect all cohorts in a single cohort table
cohort_count

Get cohort counts from a generated_cohort_set object.
tblGroup

CDM table selection helper
copyCdmTo

Copy a cdm object from one database to another
new_generated_cohort_set

Constructor for cohort_table objects
summariseQuantile

Quantile calculation using dbplyr
validateCdm

Validation report for a CDM
readCohortSet

Read a set of cohort definitions into R
datepart

Extract the day, month or year of a date in a dplyr pipeline
version

Get the CDM version
record_cohort_attrition

Add attrition reason to a cohort_table object
dbSource

Create a source for a cdm in a database.
dbms

Get the database management system (dbms) from a cdm_reference or DBI connection
reexports

Objects exported from other packages
downloadEunomiaData

Download Eunomia data files
unique_table_name

Create a unique table name for temp tables
unionCohorts

Union all cohorts in a single cohort table
stow

Collect a list of lazy queries and save the results as files
eunomiaDir

Create a copy of an example OMOP CDM dataset
eunomiaIsAvailable

Has the Eunomia dataset been cached?
snapshot

Extract CDM metadata
requireEunomia

Require eunomia to be available. The function makes sure that you can later create a eunomia database with eunomiaDir().
benchmarkCDMConnector

Run benchmark of tasks using CDMConnector
cdmFlatten

Flatten a cdm into a single observation table
appendPermanent

Run a dplyr query and add the result set to an existing
cdmCon

Get underlying database connection
asDate

as.Date dbplyr translation wrapper
CDMConnector-package

CDMConnector: Connect to an OMOP Common Data Model
cdmFromCon

Create a CDM reference object from a database connection
assertTables

Assert that tables exist in a cdm object
cdm_name

Get the CDM name
cdmDisconnect

Disconnect the connection of the cdm object
assert_write_schema

Assert that cdm has a writable schema
cdmWriteSchema

Get cdm write schema
cdm_from_tables

Create a cdm object from local tables
cdmSubsetCohort

Subset a cdm to the individuals in one or more cohorts
cdmSubset

Subset a cdm object to a set of persons
cohortAttrition

Get attrition table from a cohort_table object
cdmFromEnvironment

Create a CDM object from a pre-defined set of environment variables
cdmFromFiles

Create a CDM reference from a folder containing parquet, csv, or feather files