Learn R Programming

⚠️There's a newer version (3.5.0) of this package.Take me there.

CodelistGenerator

Installation

You can install CodelistGenerator from CRAN

install.packages("CodelistGenerator")

Or you can also install the development version of CodelistGenerator

install.packages("remotes")
remotes::install_github("darwin-eu/CodelistGenerator")

Example usage

library(dplyr)
library(CDMConnector)
library(CodelistGenerator)

For this example we’ll use the Eunomia dataset (which only contains a subset of the OMOP CDM vocabularies)

db <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(db, 
                  cdmSchema = "main", 
                  writeSchema = "main", 
                  writePrefix = "cg_")

Exploring the OMOP CDM Vocabulary tables

OMOP CDM vocabularies are frequently updated, and we can identify the version of the vocabulary of our Eunomia data

getVocabVersion(cdm = cdm)
#> [1] "v5.0 18-JAN-19"

CodelistGenerator provides various other functions to explore the vocabulary tables. For example, we can see the the different concept classes of standard concepts used for drugs

getConceptClassId(cdm,
                  standardConcept = "Standard",
                  domain = "Drug")
#> [1] "Branded Drug"        "Branded Drug Comp"   "Branded Pack"       
#> [4] "Clinical Drug"       "Clinical Drug Comp"  "CVX"                
#> [7] "Ingredient"          "Quant Branded Drug"  "Quant Clinical Drug"

Vocabulary based codelists using CodelistGenerator

CodelistGenerator provides functions to extract code lists based on vocabulary hierarchies. One example is `getDrugIngredientCodes, which we can use, for example, to get all the concept IDs used to represent aspirin.

getDrugIngredientCodes(cdm = cdm, name = "aspirin", nameStyle = "{concept_name}")
#> 
#> - aspirin (2 codes)

And if we want codelists for all drug ingredients we can simply omit the name argument and all ingredients will be returned.

ing <- getDrugIngredientCodes(cdm = cdm, nameStyle = "{concept_name}")
ing$aspirin
#> [1] 19059056  1112807
ing$diclofenac
#> [1] 1124300
ing$celecoxib
#> [1] 1118084

Systematic search using CodelistGenerator

CodelistGenerator can also support systematic searches of the vocabulary tables to support codelist development. A little like the process for a systematic review, the idea is that for a specified search strategy, CodelistGenerator will identify a set of concepts that may be relevant, with these then being screened to remove any irrelevant codes by clinical experts.

We can do a simple search for asthma

asthma_codes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  domains = "Condition"
) 
asthma_codes1 |> 
  glimpse()
#> Rows: 2
#> Columns: 6
#> $ concept_id       <int> 4051466, 317009
#> $ found_from       <chr> "From initial search", "From initial search"
#> $ concept_name     <chr> "Childhood asthma", "Asthma"
#> $ domain_id        <chr> "Condition", "Condition"
#> $ vocabulary_id    <chr> "SNOMED", "SNOMED"
#> $ standard_concept <chr> "S", "S"

But perhaps we want to exclude certain concepts as part of the search strategy, in this case we can add these like so

asthma_codes2 <- getCandidateCodes(
  cdm = cdm,
  keywords = "asthma",
  exclude = "childhood",
  domains = "Condition"
) 
asthma_codes2 |> 
  glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id       <int> 317009
#> $ found_from       <chr> "From initial search"
#> $ concept_name     <chr> "Asthma"
#> $ domain_id        <chr> "Condition"
#> $ vocabulary_id    <chr> "SNOMED"
#> $ standard_concept <chr> "S"

We can compare these two code lists like so

compareCodelists(asthma_codes1, asthma_codes2)
#> # A tibble: 2 × 3
#>   concept_id concept_name     codelist       
#>        <int> <chr>            <chr>          
#> 1    4051466 Childhood asthma Only codelist 1
#> 2     317009 Asthma           Both

We can then also see non-standard codes these are mapped from, for example here we can see the non-standard ICD10 code that maps to a standard snowmed code for gastrointestinal hemorrhage returned by our search

Gastrointestinal_hemorrhage <- getCandidateCodes(
  cdm = cdm,
  keywords = "Gastrointestinal hemorrhage",
  domains = "Condition"
)
Gastrointestinal_hemorrhage |> 
  glimpse()
#> Rows: 1
#> Columns: 6
#> $ concept_id       <int> 192671
#> $ found_from       <chr> "From initial search"
#> $ concept_name     <chr> "Gastrointestinal hemorrhage"
#> $ domain_id        <chr> "Condition"
#> $ vocabulary_id    <chr> "SNOMED"
#> $ standard_concept <chr> "S"

Summarising code use

summariseCodeUse(list("asthma" = asthma_codes1$concept_id),  
                 cdm = cdm) |> 
  glimpse()
#> Rows: 6
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "An OMOP CDM database", "An OMOP CDM database", "An O…
#> $ group_name       <chr> "codelist_name", "codelist_name", "codelist_name", "c…
#> $ group_level      <chr> "asthma", "asthma", "asthma", "asthma", "asthma", "as…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "overall", "Asthma", "Childhood asthma", "overall", "…
#> $ variable_level   <chr> NA, "317009", "4051466", NA, "4051466", "317009"
#> $ estimate_name    <chr> "record_count", "record_count", "record_count", "pers…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "101", "5", "96", "101", "96", "5"
#> $ additional_name  <chr> "source_concept_name &&& source_concept_id &&& source…
#> $ additional_level <chr> "NA &&& NA &&& NA &&& NA", "Asthma &&& 317009 &&& 195…

Copy Link

Version

Install

install.packages('CodelistGenerator')

Monthly Downloads

1,243

Version

3.3.2

License

Apache License (>= 2)

Maintainer

Edward Burn

Last Published

January 28th, 2025

Functions in CodelistGenerator (3.3.2)

getRelationshipId

Get relationship ID values from the concept relationship table
mockVocabRef

Generate example vocabulary database
getDescendants

getDescendants
getMappings

Show mappings from non-standard vocabularies to standard
getVocabularies

getVocabularies
getVocabVersion

getVocabVersion
getRouteCategories

Get available routes in a cdm reference.
stratifyByRouteCategory

Stratify a codelist by route category
getCandidateCodes

Generate candidate codelist for the OMOP CDM
stratifyByDoseUnit

Stratify a codelist by dose unit
getConceptClassId

getConceptClassId
summariseCohortCodeUse

Summarise code use among a cohort in the cdm reference
summariseAchillesCodeUse

Summarise code use from achilles counts
summariseCodeUse

Summarise code use in patient-level data
summariseUnmappedCodes

Findunmapped concepts related to codelist
tableAchillesCodeUse

Format the result of summariseAchillesCodeUse into a table.
tableOrphanCodes

Format the result of summariseOrphanCodes into a table.
summariseOrphanCodes

Find orphan codes related to a codelist using achilles counts and, if available, PHOEBE concept recommendations
tableUnmappedCodes

Format the result of summariseUnmappedCodeUse into a table.
stratifyByConcept

Stratify a codelist by the concepts included within it
sourceCodesInUse

Use achilles counts to get source codes used in the database
subsetOnRouteCategory

Subset a codelist to only those with a particular route category
subsetToCodesInUse

Use achilles counts to filter a codelist to keep only the codes used in the database
getDomains

getDomains
tableCodeUse

Format the result of summariseCodeUse into a table.
subsetOnDoseUnit

Subset a codelist to only those with a particular dose unit
subsetOnDomain

Subset a codelist to only those codes from a particular domain
tableCohortCodeUse

Format the result of summariseCohortCodeUse into a table.
availableICD10

Get all ICD codes from the cdm
getATCCodes

Get descendant codes for ATC levels
availableIngredients

Get all ingredients codes from the cdm
doseFormToRoute

Equivalence from dose from concept IDs to route categories.
compareCodelists

Compare two codelists
availableATC

Get all ATC codes from the cdm
codesInUse

Use achilles counts to get codes used in the database
codesFromConceptSet

Get concept ids from a provided path to json files
codesFromCohort

Get concept ids from a provided path to cohort json files
getDoseForm

getDoseForm
getDrugIngredientCodes

Get descendant codes for drug ingredients
getDoseUnit

Get available routes in a cdm reference.
getICD10StandardCodes

Get corresponding standard codes for ICD-10 chapters and sub-chapters
CodelistGenerator-package

CodelistGenerator: Identify Relevant Clinical Codes and Evaluate Their Use