Learn R Programming

CodelistGenerator

Installation

You can install CodelistGenerator from CRAN

install.packages("CodelistGenerator")

Or you can also install the development version of CodelistGenerator

install.packages("remotes")
remotes::install_github("darwin-eu/CodelistGenerator")

Example usage

library(dplyr)
library(CDMConnector)
library(CodelistGenerator)

For this example we’ll use the Eunomia dataset (which only contains a subset of the OMOP CDM vocabularies)

requireEunomia()
db <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(db, 
                  cdmSchema = "main", 
                  writeSchema = "main", 
                  writePrefix = "cg_")

Exploring the OMOP CDM Vocabulary tables

OMOP CDM vocabularies are frequently updated, and we can identify the version of the vocabulary of our Eunomia data

vocabularyVersion(cdm = cdm)
#> [1] "v5.0 18-JAN-19"

Vocabulary based codelists using CodelistGenerator

CodelistGenerator provides functions to extract code lists based on vocabulary hierarchies. One example is `getDrugIngredientCodes, which we can use, for example, to get the concept IDs used to represent aspirin and diclofenac.

ing <- getDrugIngredientCodes(cdm = cdm, 
                       name = c("aspirin", "diclofenac"),
                       nameStyle = "{concept_name}")
ing
#> 
#> - aspirin (2 codes)
#> - diclofenac (1 codes)
ing$aspirin
#> [1]  1112807 19059056
ing$diclofenac
#> [1] 1124300

Systematic search using CodelistGenerator

CodelistGenerator can also support systematic searches of the vocabulary tables to support codelist development. A little like the process for a systematic review, the idea is that for a specified search strategy, CodelistGenerator will identify a set of concepts that may be relevant, with these then being screened to remove any irrelevant codes by clinical experts.

We can do a simple search for asthma

asthma_codes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  domains = "Condition"
) 
asthma_codes1 |> 
  glimpse()
#> Rows: 2
#> Columns: 6
#> $ concept_id       <int> 317009, 4051466
#> $ found_from       <chr> "From initial search", "From initial search"
#> $ concept_name     <chr> "Asthma", "Childhood asthma"
#> $ domain_id        <chr> "Condition", "Condition"
#> $ vocabulary_id    <chr> "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S"

But perhaps we want to exclude certain concepts as part of the search strategy, in this case we can add these like so

asthma_codes2 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  exclude = "childhood",
  domains = "Condition"
) 
asthma_codes2 |> 
  glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id       <int> 317009
#> $ found_from       <chr> "From initial search"
#> $ concept_name     <chr> "Asthma"
#> $ domain_id        <chr> "Condition"
#> $ vocabulary_id    <chr> "SNOMED"
#> $ standard_concept <chr> "S"

Summarising code use

As well as functions for finding codes, we also have functions to summarise their use. Here for

library(flextable)
asthma_code_use <- summariseCodeUse(list("asthma" = asthma_codes1$concept_id) |> 
                                           newCodelist(),
  cdm = cdm
)
tableCodeUse(asthma_code_use, type = "flextable", style = "darwin")

Copy Link

Version

Install

install.packages('CodelistGenerator')

Monthly Downloads

801

Version

4.0.0

License

Apache License (>= 2)

Maintainer

Edward Burn

Last Published

December 17th, 2025

Functions in CodelistGenerator (4.0.0)

asConceptSetExpression

Coerce to a concept set expression
associatedRelationshipIds

Get available relationships with concepts in a codelist
availableDomains

Get the domains available in the cdm
availableDoseUnits

Get available dose units
availableConceptClassIds

Get the available concept classes used in a given set of domains
associatedDrugIngredients

Get the names of drug ingredients associated with codelist
availableDrugIngredients

Get the names of all available drug ingredients
availableRelationshipIds

Get available relationships between concepts
availableRouteCategories

Get available drug routes
doseUnitDoc

Helper for consistent documentation of doseUnit.
benchmarkCodelistGenerator

Run benchmark of codelistGenerator analyses
.optionsDoc

Helper for consistent documentation of .options.
mockVocabRef

Generate example vocabulary database
domainDoc

Helper for consistent documentation of domain.
countByDoc

Helper for consistent documentation of countBy.
nameStyleDoc

Helper for consistent documentation of nameStyle.
codesFromConceptSet

levelICD10Doc

Helper for consistent documentation of level.
getDescendants

Get descendant codes for a given concept
getCandidateCodes

Perform a systematic search to identify a candidate codelist using the OMOP CDM vocabulary tables.
compareCodelists

Compare overlap between two sets of codes
minimumCountDoc

Helper for consistent documentation of minimumCount.
keepOriginalDocSubset

Helper for consistent documentation of keepOriginal.
groupColumnDoc

Helper for consistent documentation of groupColumn.
stratifyByBrand

Stratify a codelist by brand category.
levelATCDoc

Helper for consistent documentation of level.
reexports

Objects exported from other packages
groupColumnStrataDoc

Helper for consistent documentation of groupColumn.
routeCategoryDoc

Helper for consistent documentation of routeCategory.
associatedVocabularies

Get the vocabularies associated with a codelist
associatedRouteCategories

Get drug routes associated with a codelist
subsetOnDomain

Subset a codelist to only those codes from a particular domain.
stratifyByConcept

Stratify a codelist by the concepts included within it.
tableOrphanCodes

Format the result of summariseOrphanCodes into a table
summariseCohortCodeUse

Summarise code use among a cohort in the cdm reference
codesFromCohort

Get concept ids from JSON files containing cohort definitions
searchStrategy

Report the search strategy used to identify codes when using the getCandidateCodes() function
stratifyByVocabulary

Subset a codelist to only those codes from a particular domain.
cdmDoc

Helper for consistent documentation of cdm.
byYearDoc

Helper for consistent documentation of byYear.
codelistNameDoc

Helper for consistent documentation of codelistNameDoc
standardConceptDoc

Helper for consistent documentation of standardConcept.
hideDoc

Helper for consistent documentation of hide.
doseFormToRoute

Table showing the route category associated with each dose form.
tableStyleDoc

Helper for consistent documentation of style.
doseFormDoc

Helper for consistent documentation of doseForm.
subsetOnIngredientRange

Subset a codelist to only those codes with a range of number of ingredients
subsetOnRouteCategory

Subset a codelist to only those with a particular route category
headerDoc

Helper for consistent documentation of header.
byConceptDoc

Helper for consistent documentation of byConcept.
stratifyByDomain

Stratify a codelist by domain category.
hideStrataDoc

Helper for consistent documentation of hide.
intersectCodelists

Generate a codelist from the intersection of different codelists. The generated codelist will come out in alphabetical order.
summariseOrphanCodes

Find orphan codes related to a codelist using achilles counts and, if available, PHOEBE concept recommendations
keepOriginalDoc

Helper for consistent documentation of keepOriginal.
headerStrataDoc

Helper for consistent documentation of header.
stratifyByDoseForm

Stratify a codelist by dose form.
subsetOnVocabulary

Subset a codelist to only those codes from a particular vocabulary.
summariseCodeUse

Summarise code use in patient-level data.
stratifyByRouteCategory

Stratify a codelist by route category.
typeBroadDoc

Helper for consistent documentation of type.
stratifyByDoseUnit

Stratify a codelist by dose unit.
summariseAchillesCodeUse

Summarise code use from achilles counts.
subsetToCodesInUse

Filter a codelist to keep only the codes being used in patient records
tableCohortCodeUse

Format the result of summariseCohortCodeUse into a table.
tableDoc

Helper for consistent documentation of table.
xDocCohort

Helper for consistent documentation of x where input can be codelist or cohort.
getATCCodes

Get the descendant codes of Anatomical Therapeutic Chemical (ATC) classification codes
excludeConcepts

Exclude concepts from a codelist
getDrugIngredientCodes

Get descendant codes of drug ingredients
bySexDoc

Helper for consistent documentation of bySex.
getMappings

Show mappings from non-standard vocabularies to standard.
unionCodelists

Generate a codelist from the union of different codelists. The generated codelist will come out in alphabetical order.
typeTableDoc

Helper for consistent documentation of type.
includeDescendantsDoc

Helper for consistent documentation of includeDescendants.
subsetOnDoseForm

Subset a codelist to only those codes from a particular domain.
vocabularyVersion

Get the available version of the vocabulary used in the cdm
tableCodeUse

Format the result of summariseCodeUse into a table.
subsetOnDoseUnit

Subset a codelist to only those with a particular dose unit.
tableAchillesCodeUse

Format the result of summariseAchillesCodeUse into a table
ingredientRangeDoc

Helper for consistent documentation of ingredientRange.
xDoc

Helper for consistent documentation of x.
typeNarrowDoc

Helper for consistent documentation of type.
associatedConceptClassIds

Get the concept classes associated with a codelist
associatedDoseForms

Get the dose forms associated with drug concepts in a codelist
asCodelist

Coerce to a codelist
ageGroupDoc

Helper for consistent documentation of ageGroup.
asCodelistWithDetails

Coerce to a codelist with details
associatedDomains

Get the domains associated with a codelist
associatedDoseUnits

Get available dose units
addConcepts

Add concepts to a codelist
availableDoseForms

Get the dose forms for drug concepts
availableATC

Get the names of all available Anatomical Therapeutic Chemical (ATC) classification codes
CodelistGenerator-package

CodelistGenerator: Identify Relevant Clinical Codes and Evaluate Their Use
availableVocabularies

Get the available vocabularies available in the cdm