Learn R Programming

⚠️There's a newer version (0.5.0) of this package.Take me there.

CohortConstructor

This package is currently experimental. Please use with care and report any issues you might come across.

The goal of CohortConstructor is to support the creation and manipulation of cohorts in the OMOP Common Data Model.

Installation

You can install the development version of CohortConstructor from GitHub with:

# install.packages("devtools")
devtools::install_github("ohdsi/CohortConstructor")

Creating and manipulating cohorts

To illustrate how the functionality let’s create a CDM reference for the Eunomia dataset Using the CDMConnector package.

library(CDMConnector)
library(PatientProfiles)
library(dplyr)
library(CohortConstructor)

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main", 
                    write_schema = c(prefix = "my_study_", schema = "main"))
print(cdm)

Generating concept based cohorts

We start by making a concept based cohort. For this we only need to provide concept sets and we will get a cohort back, with cohort end date the event date associated with the records, overlapping records collapsed, and only records in observation kept.

cdm$fractures <- cdm |> 
  conceptCohort(conceptSet = list(
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399),
  name = "fractures")

We can see that our starting cohorts, before we add any additional restrictions, have the following associated settings, counts, and attrition.

settings(cdm$fractures) %>% glimpse()
#> Rows: 3
#> Columns: 2
#> $ cohort_definition_id <int> 1, 2, 3
#> $ cohort_name          <chr> "ankle_fracture", "forearm_fracture", "hip_fractu…
cohort_count(cdm$fractures) %>% glimpse()
#> Rows: 3
#> Columns: 3
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 464, 569, 138
#> $ number_subjects      <int> 427, 510, 132
attrition(cdm$fractures) %>% glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 464, 569, 138
#> $ number_subjects      <int> 427, 510, 132
#> $ reason_id            <int> 1, 1, 1
#> $ reason               <chr> "Initial qualifying events", "Initial qualifying …
#> $ excluded_records     <int> 0, 0, 0
#> $ excluded_subjects    <int> 0, 0, 0

Require in date range

Once we have created our base cohort, we can then start applying additional cohort requirements. For example, first we can require that individuals’ cohort start date fall within a certain date range.

cdm$fractures <- cdm$fractures %>% 
  requireInDateRange(dateRange = as.Date(c("2000-01-01", "2020-01-01")))

Now that we’ve applied this date restriction, we can see that our cohort attributes have been updated

cohort_count(cdm$fractures) %>% glimpse()
#> Rows: 3
#> Columns: 3
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 108, 152, 62
#> $ number_subjects      <int> 104, 143, 60
attrition(cdm$fractures) %>% 
  filter(reason == "cohort_start_date between 2000-01-01 & 2020-01-01") %>% 
  glimpse()
#> Rows: 0
#> Columns: 7
#> $ cohort_definition_id <int> 
#> $ number_records       <int> 
#> $ number_subjects      <int> 
#> $ reason_id            <int> 
#> $ reason               <chr> 
#> $ excluded_records     <int> 
#> $ excluded_subjects    <int>

Applying demographic requirements

We can also add restrictions on patient characteristics such as age (on cohort start date by default) and sex.

cdm$fractures <- cdm$fractures %>% 
  requireDemographics(ageRange = list(c(40, 65)),
                      sex = "Female")

Again we can see how many individuals we’ve lost after applying these criteria.

attrition(cdm$fractures) %>% 
  filter(reason == "Age requirement: 40 to 65") %>% 
  glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 43, 64, 22
#> $ number_subjects      <int> 43, 62, 22
#> $ reason_id            <int> 4, 4, 4
#> $ reason               <chr> "Age requirement: 40 to 65", "Age requirement: 40…
#> $ excluded_records     <int> 65, 88, 40
#> $ excluded_subjects    <int> 61, 81, 38

attrition(cdm$fractures) %>% 
  filter(reason == "Sex requirement: Female") %>% 
  glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 19, 37, 12
#> $ number_subjects      <int> 19, 36, 12
#> $ reason_id            <int> 5, 5, 5
#> $ reason               <chr> "Sex requirement: Female", "Sex requirement: Fema…
#> $ excluded_records     <int> 24, 27, 10
#> $ excluded_subjects    <int> 24, 26, 10

Require presence in another cohort

We can also require that individuals are in another cohort over some window. Here for example we require that study participants are in a GI bleed cohort any time prior up to their entry in the fractures cohort.

cdm$gibleed <- cdm |> 
  conceptCohort(conceptSet = list("gibleed" = 192671),
  name = "gibleed")

cdm$fractures <- cdm$fractures %>% 
  requireCohortIntersect(targetCohortTable = "gibleed",
                             window = c(-Inf, 0))
attrition(cdm$fractures) %>% 
  filter(reason == "In cohort gibleed between -Inf & 0 days relative to cohort_start_date") %>% 
  glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 5, 7, 2
#> $ number_subjects      <int> 5, 6, 2
#> $ reason_id            <int> 8, 8, 8
#> $ reason               <chr> "In cohort gibleed between -Inf & 0 days relative…
#> $ excluded_records     <int> 14, 30, 10
#> $ excluded_subjects    <int> 14, 30, 10

Combining cohorts

Currently we have separate fracture cohorts.

Let’s say we want to create a cohort of people with any of the fractures. We could create this cohort like so:

cdm$fractures <- cdm$fractures |> 
  CohortConstructor::unionCohorts()

settings(cdm$fractures)
#> # A tibble: 1 × 3
#>   cohort_definition_id cohort_name                                    gap
#>                  <dbl> <chr>                                        <dbl>
#> 1                    1 ankle_fracture_forearm_fracture_hip_fracture     0
cohortCount(cdm$fractures)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1             14              13
cdmDisconnect(cdm)

Copy Link

Version

Install

install.packages('CohortConstructor')

Monthly Downloads

854

Version

0.2.2

License

Apache License (>= 2)

Maintainer

Edward Burn

Last Published

July 31st, 2024

Functions in CohortConstructor (0.2.2)

stratifyCohorts

Create a new cohort table from stratifying an existing one
requirePriorObservation

Restrict cohort on prior observation
requireSex

Restrict cohort on sex
requireInDateRange

Require that an index date is within a date range
requireFutureObservation

Restrict cohort on future observation
subsetCohorts

Generate a cohort table using a subset of cohorts from another table.
requireConceptIntersect

Require cohort subjects to have (or not have) events of a concept list
requireIsLastEntry

Restrict cohort to last entry per person
requireIsFirstEntry

Restrict cohort to first entry
sampleCohorts

Sample a cohort table for a given number of individuals.
requireTableIntersect

Require cohort subjects are present in another clinical table
trimDemographics

Restrict cohort on patient demographics
trimToDateRange

Trim cohort dates to be within a date range
unionCohorts

Generate cohort from the union of different cohorts
yearCohorts

Generate a new cohort table restricting cohort entries to certain years
exitAtFirstDate

Set cohort end date to the first of a set of column dates
CohortConstructor-package

CohortConstructor: Build and Manipulate Study Cohorts Using a Common Data Model
exitAtLastDate

Set cohort end date to the last of a set of column dates
demographicsCohort

Create cohorts based on patient demographics
conceptCohort

Create cohorts based on a concept set
entryAtLastDate

Set cohort start date to the last of a set of column dates
collapseCohorts

Collapse cohort entries using a certain gap to concatenate records.
entryAtFirstDate

Update cohort start date to be the first date from of a set of column dates
exitAtDeath

Set cohort end date to death date
exitAtObservationEnd

Set cohort end date to end of observation
intersectCohorts

Generate a combination cohort set between the intersection of different cohorts.
measurementCohort

Create cohorts measurement based cohorts
requireDeathFlag

Require cohort subjects have (or do not have) a death record
requireCohortIntersect

Require cohort subjects are present (or absence) in another cohort
requireDemographics

Restrict cohort on patient demographics
reexports

Objects exported from other packages
matchCohorts

Generate a new cohort matched cohort
mockCohortConstructor

Function to create a mock cdm reference for CohortConstructor
requireAge

Restrict cohort on age