Learn R Programming

CohortCharacteristics

Package overview

CohortCharacteristics contains functions for summarising characteristics of cohorts of patients identified in an OMOP CDM dataset. Once a cohort table has been created, CohortCharacteristics provides a number of functions to help provide a summary of the characteristics of the individuals within the cohort.

#> To cite package 'CohortCharacteristics' in publications use:
#> 
#>   Catala M, Guo Y, Lopez-Guell K, Burn E, Mercade-Besora N, Alcalde M
#>   (????). _CohortCharacteristics: Summarise and Visualise
#>   Characteristics of Patients in the OMOP CDM_. R package version
#>   1.0.1, <https://darwin-eu.github.io/CohortCharacteristics/>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {CohortCharacteristics: Summarise and Visualise Characteristics of Patients in the OMOP CDM},
#>     author = {Marti Catala and Yuchen Guo and Kim Lopez-Guell and Edward Burn and Nuria Mercade-Besora and Marta Alcalde},
#>     note = {R package version 1.0.1},
#>     url = {https://darwin-eu.github.io/CohortCharacteristics/},
#>   }

Package installation

You can install the latest version of CohortCharacteristics from CRAN:

install.packages("CohortCharacteristics")

Or install the development version from github:

install.packages("pak")
pak::pkg_install("darwin-eu/CohortCharacteristics")
library(CohortCharacteristics)

Content

The package contain three types of functions:

  • summarise* type functions. These functions produce <summarised_result> standard output. See omopgenerics for more information on this standardised output format. These functions are the ones that do the work in terms of extracting the necessary data from the cdm and summarising it.
  • table* type functions. These functions work with the output of the summarise ones. They will produce a table visualisation created using the visOmopResults package.
  • plot* type functions. These functions work with the output of the summarise ones. They will produce a plot visualisation created using the visOmopResults package.

Examples

Mock data

Although the package provides some simple mock data for testing (mockCohortCharacteristics()), for these examples we will use the GiBleed dataset that can be downloaded using the CDMConnector package that will give us some more real results.

library(CDMConnector)
library(duckdb)
library(dplyr, warn.conflicts = FALSE)
requireEunomia()
con <- dbConnect(duckdb(), eunomiaDir())
cdm <- cdmFromCon(con = con, cdmSchema = "main", writeSchema = "main")

Let’s create a simple cohort:

library(DrugUtilisation)
cdm <- generateIngredientCohortSet(cdm = cdm, name = "my_cohort", ingredient = c("warfarin", "acetaminophen"))

Cohort counts

We can get counts using the function summariseCohortCount():

result <- summariseCohortCount(cdm$my_cohort)
result |>
  glimpse()
#> Rows: 4
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea", "Synthea", "Synthea", "Synthea"
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "acetaminophen", "acetaminophen", "warfarin", "warfar…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall"
#> $ strata_level     <chr> "overall", "overall", "overall", "overall"
#> $ variable_name    <chr> "Number records", "Number subjects", "Number records"…
#> $ variable_level   <chr> NA, NA, NA, NA
#> $ estimate_name    <chr> "count", "count", "count", "count"
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer"
#> $ estimate_value   <chr> "13907", "2679", "137", "137"
#> $ additional_name  <chr> "overall", "overall", "overall", "overall"
#> $ additional_level <chr> "overall", "overall", "overall", "overall"

You can easily create a table using the associated table function, tableCohortCount():

tableCohortCount(result, type = "flextable")

We could create a simple plot with plotCohortCount():

result |>
  filter(variable_name == "Number subjects") |>
  plotCohortCount(x = "cohort_name", colour = "cohort_name")

All the other function work using the same dynamic, first summarise, then plot/table.

Cohort attrition

result <- summariseCohortAttrition(cdm$my_cohort)
tableCohortAttrition(result, type = "flextable")
result |>
  filter(group_level == "161_acetaminophen") |>
  plotCohortAttrition()

Characteristics

result <- summariseCharacteristics(cdm$my_cohort)
tableCharacteristics(result, type = "flextable")
result |>
  filter(variable_name == "Age") |>
  plotCharacteristics(plotType = "boxplot", colour = "cohort_name")

Timing between cohorts

result <- summariseCohortTiming(cdm$my_cohort)
tableCohortTiming(result, type = "flextable")
plotCohortTiming(
  result,
  uniqueCombinations = TRUE,
  facet = "cdm_name",
  colour = c("cohort_name_reference", "cohort_name_comparator"),
  timeScale = "years"
)
plotCohortTiming(
  result,
  plotType = "densityplot",
  uniqueCombinations = FALSE,
  facet = "cdm_name",
  colour = c("cohort_name_comparator"),
  timeScale = "years"
)

Overlap between cohort

result <- summariseCohortOverlap(cdm$my_cohort)
tableCohortOverlap(result, type = "flextable")
plotCohortOverlap(result, uniqueCombinations = TRUE)

Large scale characteristics

result <- cdm$my_cohort |>
  summariseLargeScaleCharacteristics(
    window = list(c(-90, -1), c(0, 0), c(1, 90)),
    eventInWindow = "condition_occurrence"
  )
tableTopLargeScaleCharacteristics(result, type = "flextable")
result |>
  omopgenerics::filterGroup(cohort_name == "acetaminophen") |>
  plotLargeScaleCharacteristics()
result |>
  omopgenerics::filterGroup(cohort_name == "acetaminophen") |>
  plotComparedLargeScaleCharacteristics(
    reference = "-90 to -1", colour = "variable_level"
  )

Disconnect

Disconnect from your database using CDMConnector::cdmDisconnect() to close the connection or with mockDisconnect() to close connection and delete the created mock data:

mockDisconnect(cdm)

Recommendations

Although it is technically possible, we do not recommend to pipe table or plot functions with the summarise ones. The main reason is that summarise functions take some time to run, a large scale characterisation in a big cdm object can take a few hours. If we pipe the output to a table/plot function we loose the summarise result object. In fact, some times we would send code around to be ran in others database and what we want to export is the summarised_result objects and not the table or plot which we would like to build after compiling results from different cdm objects.

Not recommended:

cdm$my_cohort |>
  summariseCharacteristics() |>
  tableCharacteristics()

Recommended:

x <- summariseCharacteristics(cdm$my_cohort)

tableCharacteristics(x)

Copy Link

Version

Install

install.packages('CohortCharacteristics')

Monthly Downloads

1,246

Version

1.1.2

License

Apache License (>= 2)

Maintainer

Marti Catala

Last Published

March 21st, 2026

Functions in CohortCharacteristics (1.1.2)

summariseCohortOverlap

Summarise overlap between cohorts in a cohort table
tableCohortAttrition

Create a visual table from the output of summariseCohortAttrition.
summariseCohortTiming

Summarise timing between entries into cohorts in a cohort table
tableCohortCodelist

Create a visual table from <summarised_result> object from summariseCohortCodelist()
plotCohortOverlap

Plot the result of summariseCohortOverlap.
summariseLargeScaleCharacteristics

This function is used to summarise the large scale characteristics of a cohort table
tableCharacteristics

Format a summarise_characteristics object into a visual table.
plotCohortTiming

Plot summariseCohortTiming results.
tableCohortTiming

Format a summariseCohortTiming result into a visual table.
tableDoc

Helper for consistent documentation of table.
summariseCohortCodelist

Summarise the cohort codelist attribute
plotComparedLargeScaleCharacteristics

create a ggplot from the output of summariseLargeScaleCharacteristics.
plotDoc

Helper for consistent documentation of plot.
summariseCohortCount

Summarise counts for cohorts in a cohort table
tableLargeScaleCharacteristics

Explore and compare the large scale characteristics of cohorts
tableCohortOverlap

Format a summariseOverlapCohort result into a visual table.
tableCohortCount

Format a summarise_characteristics object into a visual table.
tableTopLargeScaleCharacteristics

Visualise the top concepts per each cdm name, cohort, statification and window.
uniqueCombinationsDoc

Helper for consistent documentation of uniqueCombinations.
timeScaleDoc

Helper for consistent documentation of timeScale.
benchmarkCohortCharacteristics

Benchmark the main functions of CohortCharacteristics package.
CohortCharacteristics-package

CohortCharacteristics: Summarise and Visualise Characteristics of Patients in the OMOP CDM
availableTableColumns

Available columns to use in header, groupColumn and hide arguments in table functions.
availablePlotColumns

Available columns to use in facet and colour arguments in plot functions.
cohortDoc

Helper for consistent documentation of cohort.
cohortIdDoc

Helper for consistent documentation of cohortId.
summariseCohortAttrition

Summarise attrition associated with cohorts in a cohort table
plotCohortAttrition

create a ggplot from the output of summariseLargeScaleCharacteristics.
summariseCharacteristics

Summarise characteristics of cohorts in a cohort table
plotCohortCount

Plot the result of summariseCohortCount.
plotLargeScaleCharacteristics

create a ggplot from the output of summariseLargeScaleCharacteristics.
reexports

Objects exported from other packages
resultDoc

Helper for consistent documentation of result.
strataDoc

Helper for consistent documentation of strata.
mockCohortCharacteristics

It creates a mock database for testing CohortCharacteristics package
plotCharacteristics

Create a ggplot from the output of summariseCharacteristics.