Learn R Programming

CohortConstructor

The goal of CohortConstructor is to support the creation and manipulation of study cohorts in data mapped to the OMOP CDM.

Tested sources

SourceDriverCDM referenceStatus
Local R dataframeN/Aomopgenerics::cdmFromTables()
In-memory duckdb datatabaseduckdbCDMConnector::cdmFromCon()
Postgres databaseRPostgresCDMConnector::cdmFromCon()
Postgres databaseDatabaseConnectorCDMConnector::cdmFromCon()
SQL Server databaseodbcCDMConnector::cdmFromCon()
SQL Server databaseDatabaseConnectorCDMConnector::cdmFromCon()

Installation

The package can be installed from CRAN:

install.packages("CohortConstructor")

Or you can install the development version of the package from GitHub:

# install.packages("devtools")
devtools::install_github("ohdsi/CohortConstructor")

Creating and manipulating cohorts

To illustrate the functionality provided by CohortConstructor let’s create a cohort of people with a fracture using the Eunomia dataset. We’ll first load required packages and create a cdm reference for the data.

library(omopgenerics)
library(omock)
library(PatientProfiles)
library(dplyr)
library(CohortConstructor)
library(CohortCharacteristics)
cdm <- mockCdmFromDataset(datasetName = "GiBleed")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
cdm
#> 
#> ── # OMOP CDM reference (local) of GiBleed ─────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Generating concept-based fracture cohorts

We will first need to identify codes that could be used to represent fractures of interest. To find these we’ll use the CodelistGenerator package (note, we will just find a few codes because we are using synthetic data with a subset of the full vocabularies).

library(CodelistGenerator)

hip_fx_codes <- getCandidateCodes(cdm, "hip fracture")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 1 candidate concept identified
#> 
#> Time taken: 0 minutes and 0 seconds
forearm_fx_codes <- getCandidateCodes(cdm, "forearm fracture")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 1 candidate concept identified
#> 
#> Time taken: 0 minutes and 0 seconds

fx_codes <- newCodelist(list("hip_fracture" = hip_fx_codes$concept_id,
                             "forearm_fracture"= forearm_fx_codes$concept_id))
fx_codes
#> 
#> ── 2 codelists ─────────────────────────────────────────────────────────────────
#> 
#> - forearm_fracture (1 codes)
#> - hip_fracture (1 codes)

Now we can quickly create our cohorts. For this we only need to provide the codes we have defined and we will get a cohort back, where we start by setting cohort exit as the same day as event start (the date of the fracture).

cdm$fractures <- cdm |> 
  conceptCohort(conceptSet = fx_codes, 
                exit = "event_start_date", 
                name = "fractures")

After creating our initial cohort we will update it so that exit is set at up to 180 days after start (so long as individuals’ observation end date is on or after this - if not, exit will be at observation period end).

cdm$fractures <- cdm$fractures |> 
  padCohortEnd(days = 180)

We can see that our starting cohorts, before we add any additional restrictions, have the following associated settings, counts, and attrition.

settings(cdm$fractures) |> glimpse()
#> Rows: 2
#> Columns: 4
#> $ cohort_definition_id <int> 1, 2
#> $ cohort_name          <chr> "forearm_fracture", "hip_fracture"
#> $ cdm_version          <chr> "5.3", "5.3"
#> $ vocabulary_version   <chr> "v5.0 18-JAN-19", "v5.0 18-JAN-19"
cohortCount(cdm$fractures) |> glimpse()
#> Rows: 2
#> Columns: 3
#> $ cohort_definition_id <int> 1, 2
#> $ number_records       <int> 569, 138
#> $ number_subjects      <int> 510, 132
attrition(cdm$fractures) |> glimpse()
#> Rows: 10
#> Columns: 7
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2
#> $ number_records       <int> 569, 569, 569, 569, 569, 138, 138, 138, 138, 138
#> $ number_subjects      <int> 510, 510, 510, 510, 510, 132, 132, 132, 132, 132
#> $ reason_id            <int> 1, 2, 3, 4, 5, 1, 2, 3, 4, 5
#> $ reason               <chr> "Initial qualifying events", "Record in observati…
#> $ excluded_records     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#> $ excluded_subjects    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Create an overall fracture cohort

So far we have created three separate fracture cohorts. Let’s say we also want a cohort of people with any of the fractures. We could union our three cohorts to create this overall cohort like so:

cdm$fractures <- unionCohorts(cdm$fractures,
                              cohortName = "any_fracture", 
                              keepOriginalCohorts = TRUE,
                              name ="fractures")
settings(cdm$fractures)
#> # A tibble: 3 × 5
#>   cohort_definition_id cohort_name      cdm_version vocabulary_version   gap
#>                  <int> <chr>            <chr>       <chr>              <dbl>
#> 1                    1 forearm_fracture 5.3         v5.0 18-JAN-19        NA
#> 2                    2 hip_fracture     5.3         v5.0 18-JAN-19        NA
#> 3                    3 any_fracture     <NA>        <NA>                   0
cohortCount(cdm$fractures)
#> # A tibble: 3 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1            569             510
#> 2                    2            138             132
#> 3                    3            707             611

Require in date range

Once we have created our base fracture cohort, we can then start applying additional cohort requirements. For example, first we can require that individuals’ cohort start date fall within a certain date range.

cdm$fractures <- cdm$fractures |> 
  requireInDateRange(dateRange = as.Date(c("2000-01-01", "2020-01-01")))

Now that we’ve applied this date restriction, we can see that our cohort attributes have been updated

cohortCount(cdm$fractures) |> glimpse()
#> Rows: 3
#> Columns: 3
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 152, 62, 214
#> $ number_subjects      <int> 143, 60, 196
attrition(cdm$fractures) |> 
  filter(reason == "cohort_start_date between 2000-01-01 & 2020-01-01") |> 
  glimpse()
#> Rows: 0
#> Columns: 7
#> $ cohort_definition_id <int> 
#> $ number_records       <int> 
#> $ number_subjects      <int> 
#> $ reason_id            <int> 
#> $ reason               <chr> 
#> $ excluded_records     <int> 
#> $ excluded_subjects    <int>

Applying demographic requirements

We can also add restrictions on patient characteristics such as age (on cohort start date by default) and sex.

cdm$fractures <- cdm$fractures |> 
  requireDemographics(ageRange = list(c(40, 65)),
                      sex = "Female")

Again we can see how many individuals we’ve lost after applying these criteria.

attrition(cdm$fractures) |> 
  filter(reason == "Age requirement: 40 to 65") |> 
  glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 64, 22, 86
#> $ number_subjects      <int> 62, 22, 83
#> $ reason_id            <int> 8, 8, 4
#> $ reason               <chr> "Age requirement: 40 to 65", "Age requirement: 40…
#> $ excluded_records     <int> 88, 40, 128
#> $ excluded_subjects    <int> 81, 38, 113

attrition(cdm$fractures) |> 
  filter(reason == "Sex requirement: Female") |> 
  glimpse()
#> Rows: 3
#> Columns: 7
#> $ cohort_definition_id <int> 1, 2, 3
#> $ number_records       <int> 37, 12, 49
#> $ number_subjects      <int> 36, 12, 48
#> $ reason_id            <int> 9, 9, 5
#> $ reason               <chr> "Sex requirement: Female", "Sex requirement: Fema…
#> $ excluded_records     <int> 27, 10, 37
#> $ excluded_subjects    <int> 26, 10, 35

Require presence in another cohort

We can also require that individuals are (or are not) in another cohort over some window. Here for example we require that study participants are in a GI bleed cohort any time prior up to their entry in the fractures cohort.

cdm$gibleed <- cdm |> 
  conceptCohort(conceptSet = list("gibleed" = 192671L),
  name = "gibleed")

cdm$fractures <- cdm$fractures |> 
  requireCohortIntersect(targetCohortTable = "gibleed",
                         intersections = 0,
                         window = c(-Inf, 0))
attrition(cdm$fractures) |> 
  filter(reason == "Not in cohort gibleed between -Inf & 0 days relative to cohort_start_date") |> 
  glimpse()
#> Rows: 0
#> Columns: 7
#> $ cohort_definition_id <int> 
#> $ number_records       <int> 
#> $ number_subjects      <int> 
#> $ reason_id            <int> 
#> $ reason               <chr> 
#> $ excluded_records     <int> 
#> $ excluded_subjects    <int>
cdmDisconnect(cdm)

More information

CohortConstructor provides much more functionality for creating and manipulating cohorts. See the package vignettes for more details.

Copy Link

Version

Install

install.packages('CohortConstructor')

Monthly Downloads

783

Version

0.6.1

License

Apache License (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Edward Burn

Last Published

December 23rd, 2025

Functions in CohortConstructor (0.6.1)

benchmarkData

Benchmarking results
entryAtFirstDate

Update cohort start date to be the first date from of a set of column dates
exitAtFirstDate

Set cohort end date to the first of a set of column dates
copyCohorts

Copy a cohort table
deathCohort

Create cohort based on the death table
demographicsCohort

Create cohorts based on patient demographics
conceptCohort

Create cohorts based on a concept set
requireAge

Restrict cohort on age
conceptSetDoc

Helper for consistent documentation of conceptSet.
daysDoc

Helper for consistent documentation of days.
columnDateDoc

Helper for consistent documentation of dateColumns and returnReason.
intersectCohorts

Generate a combination cohort set between the intersection of different cohorts.
exitAtObservationEnd

Set cohort end date to end of observation
padCohortEnd

Add days to cohort end
padCohortDate

Set cohort start or cohort end
exitAtLastDate

Set cohort end date to the last of a set of column dates
entryAtLastDate

Set cohort start date to the last of a set of column dates
gapDoc

Helper for consistent documentation of gap.
matchCohorts

Generate a new cohort matched cohort
requireIsFirstEntry

Restrict cohort to first entry
exitAtDeath

Set cohort end date to death date
renameCohort

Utility function to change the name of a cohort.
softValidationDoc

Helper for consistent documentation of .softValidation.
requireCohortIntersect

Require cohort subjects are present (or absence) in another cohort
requireConceptIntersect

Require cohort subjects to have (or not have) events of a concept list
requireIntersectDoc

Helper for consistent documentation of arguments in requireIntersect functions.
requireDemographics

Restrict cohort on patient demographics
measurementCohort

Create measurement-based cohorts
padCohortStart

Add days to cohort start
requireInDateRange

Require that an index date is within a date range
requireIsEntry

Restrict cohort to specific entry
stratifyCohorts

Create a new cohort table from stratifying an existing one
requireIsLastEntry

Restrict cohort to last entry per person
subsetCohorts

Generate a cohort table keeping a subset of cohorts.
requireDemographicsDoc

Helper for consistent documentation of arguments in requireDemographics.
sampleCohorts

Sample a cohort table for a given number of individuals.
requireMinCohortCount

Filter cohorts to keep only records for those with a minimum amount of subjects
requirePriorObservation

Restrict cohort on prior observation
yearCohorts

Generate a new cohort table restricting cohort entries to certain years
requireFutureObservation

Restrict cohort on future observation
keepOriginalCohortsDoc

Helper for consistent documentation of keepOriginalCohorts.
timeWindowCohorts

Split cohorts based on time-windows
mockCohortConstructor

Function to create a mock cdm reference for CohortConstructor
requireDuration

Require cohort entries last for a certain number of days
trimDuration

Trim cohort dates to be within a certain interval of days
trimDemographics

Trim cohort on patient demographics
reexports

Objects exported from other packages
nameDoc

Helper for consistent documentation of name.
unionCohorts

Generate cohort from the union of different cohorts
requireSex

Restrict cohort on sex
trimToDateRange

Trim cohort dates to be within a date range
requireTableIntersect

Require cohort subjects are present in another clinical table
requireFullContributionDoc

Helper for consistent documentation of requireFullContribution.
windowDoc

Helper for consistent documentation of window.
addCohortTableIndex

Add an index to a cohort table
baseCohortDoc

Helper for consistent documentation of conceptCohort and measurementCohort.
atFirstDoc

Helper for consistent documentation of arguments in atFirst functions.
cohortIdSubsetDoc

Helper for consistent documentation of cohortId.
CohortConstructor-package

CohortConstructor: Build and Manipulate Study Cohorts Using a Common Data Model
benchmarkCohortConstructor

Run benchmark of CohortConstructor package
collapseCohorts

Collapse cohort entries using a certain gap to concatenate records.
cohortDoc

Helper for consistent documentation of cohort.
cohortIdModifyDoc

Helper for consistent documentation of cohortId.
cdmDoc

Helper for consistent documentation of cdm.
collapseDoc

Helper for consistent documentation of collapse.