Learn R Programming

CodelistGenerator

Installation

You can install CodelistGenerator from CRAN

install.packages("CodelistGenerator")

Or you can also install the development version of CodelistGenerator

install.packages("remotes")
remotes::install_github("darwin-eu/CodelistGenerator")

Example usage

library(dplyr)
library(CDMConnector)
library(CodelistGenerator)

For this example we’ll use the Eunomia dataset (which only contains a subset of the OMOP CDM vocabularies)

requireEunomia()
db <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(db, 
                  cdmSchema = "main", 
                  writeSchema = "main", 
                  writePrefix = "cg_")

Exploring the OMOP CDM Vocabulary tables

OMOP CDM vocabularies are frequently updated, and we can identify the version of the vocabulary of our Eunomia data

vocabularyVersion(cdm = cdm)
#> [1] "v5.0 18-JAN-19"

Vocabulary based codelists using CodelistGenerator

CodelistGenerator provides functions to extract code lists based on vocabulary hierarchies. One example is `getDrugIngredientCodes, which we can use, for example, to get the concept IDs used to represent aspirin and diclofenac.

ing <- getDrugIngredientCodes(cdm = cdm, 
                       name = c("aspirin", "diclofenac"),
                       nameStyle = "{concept_name}")
ing
#> 
#> - aspirin (2 codes)
#> - diclofenac (1 codes)
ing$aspirin
#> [1]  1112807 19059056
ing$diclofenac
#> [1] 1124300

Systematic search using CodelistGenerator

CodelistGenerator can also support systematic searches of the vocabulary tables to support codelist development. A little like the process for a systematic review, the idea is that for a specified search strategy, CodelistGenerator will identify a set of concepts that may be relevant, with these then being screened to remove any irrelevant codes by clinical experts.

We can do a simple search for asthma

asthma_codes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  domains = "Condition"
) 
asthma_codes1 |> 
  glimpse()
#> Rows: 2
#> Columns: 6
#> $ concept_id       <int> 317009, 4051466
#> $ found_from       <chr> "From initial search", "From initial search"
#> $ concept_name     <chr> "Asthma", "Childhood asthma"
#> $ domain_id        <chr> "Condition", "Condition"
#> $ vocabulary_id    <chr> "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S"

But perhaps we want to exclude certain concepts as part of the search strategy, in this case we can add these like so

asthma_codes2 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  exclude = "childhood",
  domains = "Condition"
) 
asthma_codes2 |> 
  glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id       <int> 317009
#> $ found_from       <chr> "From initial search"
#> $ concept_name     <chr> "Asthma"
#> $ domain_id        <chr> "Condition"
#> $ vocabulary_id    <chr> "SNOMED"
#> $ standard_concept <chr> "S"

Summarising code use

As well as functions for finding codes, we also have functions to summarise their use. Here for

library(flextable)
asthma_code_use <- summariseCodeUse(list("asthma" = asthma_codes1$concept_id) |> 
                                           newCodelist(),
  cdm = cdm
)
tableCodeUse(asthma_code_use, type = "flextable", style = "darwin")

Copy Link

Version

Install

install.packages('CodelistGenerator')

Monthly Downloads

905

Version

4.0.1

License

Apache License (>= 2)

Maintainer

Edward Burn

Last Published

January 8th, 2026

Functions in CodelistGenerator (4.0.1)

codelistNameDoc

Helper for consistent documentation of codelistNameDoc
availableDrugIngredients

Get the names of all available drug ingredients
codesFromCohort

Get concept ids from JSON files containing cohort definitions
excludeConcepts

Exclude concepts from a codelist
associatedRouteCategories

Get drug routes associated with a codelist
getATCCodes

Get the descendant codes of Anatomical Therapeutic Chemical (ATC) classification codes
doseFormDoc

Helper for consistent documentation of doseForm.
associatedDrugIngredients

Get the names of drug ingredients associated with codelist
associatedRelationshipIds

Get available relationships with concepts in a codelist
getCandidateCodes

Perform a systematic search to identify a candidate codelist using the OMOP CDM vocabulary tables.
getDescendants

Get descendant codes for a given concept
keepOriginalDocSubset

Helper for consistent documentation of keepOriginal.
byConceptDoc

Helper for consistent documentation of byConcept.
headerStrataDoc

Helper for consistent documentation of header.
domainDoc

Helper for consistent documentation of domain.
doseFormToRoute

Table showing the route category associated with each dose form.
bySexDoc

Helper for consistent documentation of bySex.
headerDoc

Helper for consistent documentation of header.
availableATC

Get the names of all available Anatomical Therapeutic Chemical (ATC) classification codes
hideStrataDoc

Helper for consistent documentation of hide.
countByDoc

Helper for consistent documentation of countBy.
hideDoc

Helper for consistent documentation of hide.
stratifyByDomain

Stratify a codelist by domain category.
stratifyByDoseForm

Stratify a codelist by dose form.
availableConceptClassIds

Get the available concept classes used in a given set of domains
availableRouteCategories

Get available drug routes
availableRelationshipIds

Get available relationships between concepts
codesFromConceptSet

associatedVocabularies

Get the vocabularies associated with a codelist
stratifyByVocabulary

Subset a codelist to only those codes from a particular domain.
compareCodelists

Compare overlap between two sets of codes
routeCategoryDoc

Helper for consistent documentation of routeCategory.
getDrugIngredientCodes

Get descendant codes of drug ingredients
mockVocabRef

Generate example vocabulary database
getMappings

Show mappings from non-standard vocabularies to standard.
stratifyByConcept

Stratify a codelist by the concepts included within it.
stratifyByBrand

Stratify a codelist by brand category.
minimumCountDoc

Helper for consistent documentation of minimumCount.
nameStyleDoc

Helper for consistent documentation of nameStyle.
reexports

Objects exported from other packages
levelICD10Doc

Helper for consistent documentation of level.
subsetOnDoseForm

Subset a codelist to only those codes from a particular domain.
subsetOnIngredientRange

Subset a codelist to only those codes with a range of number of ingredients
subsetOnDomain

Subset a codelist to only those codes from a particular domain.
benchmarkCodelistGenerator

Run benchmark of codelistGenerator analyses
groupColumnDoc

Helper for consistent documentation of groupColumn.
availableVocabularies

Get the available vocabularies available in the cdm
includeDescendantsDoc

Helper for consistent documentation of includeDescendants.
typeTableDoc

Helper for consistent documentation of type.
groupColumnStrataDoc

Helper for consistent documentation of groupColumn.
summariseAchillesCodeUse

Summarise code use from achilles counts.
summariseCodeUse

Summarise code use in patient-level data.
summariseCohortCodeUse

Summarise code use among a cohort in the cdm reference
.optionsDoc

Helper for consistent documentation of .options.
ingredientRangeDoc

Helper for consistent documentation of ingredientRange.
doseUnitDoc

Helper for consistent documentation of doseUnit.
summariseOrphanCodes

Find orphan codes related to a codelist using achilles counts and, if available, PHOEBE concept recommendations
unionCodelists

Generate a codelist from the union of different codelists. The generated codelist will come out in alphabetical order.
levelATCDoc

Helper for consistent documentation of level.
stratifyByDoseUnit

Stratify a codelist by dose unit.
intersectCodelists

Generate a codelist from the intersection of different codelists. The generated codelist will come out in alphabetical order.
standardConceptDoc

Helper for consistent documentation of standardConcept.
searchStrategy

Report the search strategy used to identify codes when using the getCandidateCodes() function
subsetToCodesInUse

Filter a codelist to keep only the codes being used in patient records
subsetOnVocabulary

Subset a codelist to only those codes from a particular vocabulary.
keepOriginalDoc

Helper for consistent documentation of keepOriginal.
typeBroadDoc

Helper for consistent documentation of type.
typeNarrowDoc

Helper for consistent documentation of type.
subsetOnDoseUnit

Subset a codelist to only those with a particular dose unit.
tableDoc

Helper for consistent documentation of table.
tableCohortCodeUse

Format the result of summariseCohortCodeUse into a table.
xDocCohort

Helper for consistent documentation of x where input can be codelist or cohort.
vocabularyVersion

Get the available version of the vocabulary used in the cdm
xDoc

Helper for consistent documentation of x.
subsetOnRouteCategory

Subset a codelist to only those with a particular route category
tableCodeUse

Format the result of summariseCodeUse into a table.
tableAchillesCodeUse

Format the result of summariseAchillesCodeUse into a table
tableOrphanCodes

Format the result of summariseOrphanCodes into a table
stratifyByRouteCategory

Stratify a codelist by route category.
tableStyleDoc

Helper for consistent documentation of style.
asCodelistWithDetails

Coerce to a codelist with details
associatedConceptClassIds

Get the concept classes associated with a codelist
associatedDoseForms

Get the dose forms associated with drug concepts in a codelist
asConceptSetExpression

Coerce to a concept set expression
associatedDoseUnits

Get available dose units
addConcepts

Add concepts to a codelist
asCodelist

Coerce to a codelist
CodelistGenerator-package

CodelistGenerator: Identify Relevant Clinical Codes and Evaluate Their Use
associatedDomains

Get the domains associated with a codelist
availableDoseUnits

Get available dose units
ageGroupDoc

Helper for consistent documentation of ageGroup.
availableDomains

Get the domains available in the cdm
availableDoseForms

Get the dose forms for drug concepts
byYearDoc

Helper for consistent documentation of byYear.
cdmDoc

Helper for consistent documentation of cdm.