Learn R Programming

⚠️There's a newer version (4.0.9) of this package.Take me there.

icd

downloads from Rstudio mirror

ICD-9 and ICD-10 comorbidities, manipulation and validation

Features

  • find comorbidities of patients based on admission or discharge ICD-9 or ICD-10 codes, e.g. Cancer, Heart Disease
    • several standard mappings of ICD codes to comorbidities are included (Quan, Deyo, Elixhauser, AHRQ)
    • very fast assignment of ICD codes to comorbidities (using C and C++ internally, with automatic parallel execution using OpenMP when available), assigning millions of comorbidities in a few seconds
  • Charlson and Van Walraven score calculations
  • Hierarchical Condition Codes (HCC)
  • validation of ICD codes from different annual revisions of ICD-9-CM and ICD-10-CM
  • summarizing ICD codes into groups, and to human-readable descriptions
  • correct conversion between different representations of ICD codes, with and without a decimal points, leading and trailing characters (this is not trivial for ICD-9-CM). ICD-9 to ICD-10 conversion is left as an exercise for the user!
  • comprehensive test suite to increase confidence in accurate processing of ICD codes

Install

The latest version is available in github icd, and can be installed with:

    install.packages("devtools")
    devtools::install_github("jackwasey/icd")

The master branch at github should always build and pass all tests and R CMD check, and will be similar or identical to the most recent CRAN release. The CRAN releases are stable milestones. Contributions and bug reports are encouraged and essential for this package to remain current and useful to the many people who have installed it.

Introduction

Calculate comorbidities, Charlson scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. Common ambiguities and code formats are handled. This package enables a work flow from raw lists of ICD codes in hospital billing databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. This package replaces ‘icd9’, which should be uninstalled.

Relevance

ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.

Comorbidities

A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.

ICD-9 codes

ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.

ICD-10 codes

ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.

Examples

See also the vignettes and examples embedded in the help for each function for more. Here’s a taste:

patient_data
#>   visit_id  icd9  poa
#> 1     1000 40201    Y
#> 2     1000  2258 <NA>
#> 3     1000  7208    N
#> 4     1000 25001    Y
#> 5     1001 34400    X
#> 6     1001  4011    Y
#> 7     1002  4011    E

# reformat input data as needed
icd_long_to_wide(patient_data)
#>      [,1]    [,2]   [,3]   [,4]   
#> 1000 "40201" "2258" "7208" "25001"
#> 1001 "34400" "4011" NA     NA     
#> 1002 "4011"  NA     NA     NA

# get comorbidities using Quan's application of Deyo's Charlson comorbidity groups
icd_comorbid_quan_deyo(patient_data)
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1002 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE      TRUE FALSE  FALSE       FALSE FALSE FALSE
#> 1002 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

# find diagnoses present on admission:
icd_filter_poa(patient_data)
#>   visit_id  icd9
#> 1     1000 40201
#> 4     1000 25001
#> 6     1001  4011

# get comorbidities based on present-on-arrival diagnoses, use magrittr to flow the data
patient_data %>% icd_filter_poa %>% icd_comorbid_quan_deyo
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

Look at the help files for details and examples of almost every function in this package.

?icd_is_valid
?icd_comorbid

Note that reformatting from wide to long and back is not as straightforward as using the various Hadley Wickham tools for doing this: knowing the more detailed structure of the data let’s us do this better for the case of dealing with ICD codes.

Advanced

Source Data and SAS format files

In the spirit of reproducible research, all the R data files in this package can be recreated from source. The size of the source files makes it cumbersome to include them in the R package available on CRAN. Using the github source, you can pull the original data and SAS format files, and rebuild the data; or use the tools provided by this package to update the data using new source data files, e.g. when ICD-10-CM 2017 is released.

Doing the parsing requires additional dependencies, which are not gratuitously included in the package requirements, since most users won’t need them. Benchmarking this package also has additional requirements. These are: - xml2 - ggplot2 - digest

Automated testing

One of the strengths of this package is a thorough test suite, including over 10,000 lines of testing code.

find tests -type f -exec cat '{}' + | wc -l
10910

A better metric of testing and code quality is code coverage, for which codecov and coveralls are used. The automated results to codecov, whereas the travis builds report coverage to coveralls. The parsing code is a significant chunk of code, and may or may not be included in the automated builds depending on whether the source data is available. With the data available, test coverage is >95%.

Contributing and Building

Contributions of any kind to icd are very welcome.

To build R, Rcpp must be compiled from source. This happens automatically on Linux, but on Mac and Windows, the following is required: install.packages("Rcpp", type="source") to avoid build errors.

Copy Link

Version

Monthly Downloads

28

Version

2.4.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jack O Wasey

Last Published

March 10th, 2018

Functions in icd (2.4.1)

icd9PartsToShort

Convert ICD9 codes between formats and structures.
icd10cm_extract_sub_chapters

Get sub-chapters from the 2016 XML for ICD-10-CM
icd10cm_get_all_defined

get all ICD-10-CM codes
icd9_chapters_to_map

convert the chapter headings to lists of codes
icd9_drop_leading_zeroes

drop zero padding from decimal ICD-9 code.
icd9_generate_map_elix

Generate Elixhauser comorbidities
icd9_generate_map_quan_elix

Generate Quan's revised Elixhauser comorbidities
icd9_parse_ahrq_sas

parse AHRQ SAS code to get mapping
icd9_parse_cc

Generate ICD to HCC Crosswalks from CMS
icd9cm_generate_chapters_hierarchy

generate ICD-9-CM hierarchy
icd9cm_get_billable

Get billable ICD-9-CM codes
condense_explain_table

condense icd_explain_table output down to major codes
get_non_ASCII

mimic the R CMD check test
generate_random_short_icd9

generate random ICD-9 codes
icd10_generate_map_quan_elix

generate ICD-10 Quan/Elixhauser mapping
generate_spelling

Generate spelling exceptions
get_visit_name

Get or guess the name of the visit ID column
condense_explain_table_worker

generate condensed code and condensed number columns
apply_hier

Apply hierarchy and choose naming for each comorbidity map
icd_short_to_parts.icd10

Convert decimal ICD codes to component parts
icd_decimal_to_short

Convert Decimal format ICD codes to short format
chapter_to_desc_range

Parse a (sub)chapter text description with parenthesised range
combine

combine ICD codes
icd_diff_comorbid

show the difference between two comorbidity mappings
attr_short_diag

Change whether ICD code has short or long attribute
icd10_parse_cc

Import the ICD10 to CC crosswalks
as_char_no_warn

convert to character vector without warning
fixSubchapterNa

Fix NA sub-chapters in RTF parsing
icd_expand_minor

expand decimal part of ICD-9 code to cover all possible sub-codes
env_to_vec_flip

return a new environment with names and values swapped
icd9ChildrenShortStd

C++ implementation of finding children of short codes
generate_random_short_icd10cm_bill

generate random ICD-9 codes
attr_decimal_diag

Set ICD short-form diagnosis code attribute
icd9_map_ahrq

AHRQ comorbidities
expect_chap_present

expect that a chapter with given title exists, case-insensitive
icd9MajMinToCode

Convert mjr and mnr vectors to single code
icd9_map_elix

Elixhauser comorbidities
icd_get_major.icd9

Get major part of an ICD code
icd10_comorbid_parent_search_cpp

Internal function to find ICD-10 parents
expect_equal_no_icd

expect equal, ignoring any ICD classes
icd_classes_ordered

prefer an order of classes
icd9_parse_leaf_desc_ver

Read the ICD-9-CM description data as provided by the Center for Medicaid Services (CMS).
icd9_parse_quan_deyo_sas

parse original SAS code defining Quan's update of Deyo comorbidities.
expect_chap_equal

expect named sub-chapter has a given range, case insensitive
%eine%

in/match equivalent for two Environment arguments
generate_uranium_pathology

generate uranium pathology data
icd10_generate_map_quan_deyo

Generate Quan mapping for Charlson categories of ICD-10 codes
icd9AddLeadingZeroesMajorSingle

Simpler add leading zeroes without converting to parts and back
icd9ComorbidShortCpp

Find comorbidities from ICD-9 codes.
icd9AddLeadingZeroesShortSingle

Decompose a 'short' ICD code and insert the leading zeroes as needed.
icd_get_valid

invalid subset of decimal or short_code ICD-9 codes
icd_comorbid_mat_to_df

convert comorbidity data frame from matrix
icd-package

icd: Tools for Working with ICD-9 and ICD-10 Codes, and Finding Comorbidities
generate_vermont_dx

generate vermont_dx data
icd9RandomShortN

Generate random short-form ICD-9 codes
icd_names_elix

Comorbidity names
icd_comorbid_hcc_worker

apply HCC rules to either ICD-9 or ICD-10 codes
icd9AppendMinors

append minor to major using std
icd9MajMinToShortStd

initialize a std::vector of strings with repeated value of the minor
icd9ChildrenShort11

Find child codes from vector of ICD-9 codes.
icd9_generate_sources

generate data for finding source data for ICD-9-CM
icd10_sub_chapters

ICD-10 sub-chapters
icd9_expand_range_worker

expand range worker
icd_explain_table

Explain ICD-9 and ICD-10 codes in English from decimal (123.45 style), Tabulates the decimal format alongside converted non-decimal format.
icd9_extract_alpha_numeric

extract alphabetic, and numeric part of ICD-9 code prefix
icd9_fetch_ahrq_sas

get the SAS code from AHRQ
icd_explain_table_worker

generate table of ICD code explanations
icd9_generate_all_

generate lookup data for each class of ICD-9 code
icd_get_defined

Select only defined ICD codes
icd9_map_hcc

Medicare Hierarchical Condition Categories
icd_parse_cc_hierarchy

Import CMS HCC Rules
icd_wide_to_long

Convert ICD data from wide to long format
icd9_get_chapters

get ICD-9 Chapters from vector of ICD-9 codes
icd9_add_leading_zeroes_cpp

Add leading zeroes to incomplete ICD-9 codes
icd10cm2016

ICD-10-CM
icd9_is_n

do ICD-9 codes belong to numeric, V or E sub-types?
icd9_chapters

ICD-9 chapters
icd_get_invalid

Get invalid ICD codes
is.icd_short_diag

test ICD-related classes
icd_is_billable

Determine whether codes are billable leaf-nodes
icd_is_defined

Check whether ICD-9 codes exist
icd9_is_n_cpp

Do elements of vector begin with V, E (or any other character)?
icd9_map_quan_elix

Quan adaptation of Elixhauser comorbidities
icd_short_to_parts

Convert short format ICD codes to component parts
icd9_map_quan_deyo

Quan adaptation of Deyo/Charlson comorbidities
parse_leaf_descriptions_all

Get billable codes from all available years
icd9_order_short

Get order of short-form ICD-9 codes
icd_charlson

Calculate Charlson Comorbidity Index (Charlson Score)
print.icd_comorbidity_map

Print a comorbidity map
icd_condense

Condense ICD-9 code by replacing complete families with parent codes
icd9cm_billable

list of annual versions of billable leaf nodes of ICD-9-CM
icd_filter_poa

Filters data frame based on present-on-arrival flag
icd_children

Get children of ICD codes
icd_count_codes

Count ICD codes or comorbidities for each patient
icd9_sources

ICD-9 data sources
icd_filter_valid

Filter ICD codes by validity.
icd9cm_hierarchy

Latest ICD-9-CM diagnosis codes, in flat data.frame format
icd9cm_latest_edition

Latest ICD-9-CM edition
icd_children_defined

defined children of ICD codes
icd_comorbid_df_to_mat

convert comorbidity matrix to data frame
icd_comorbid_hcc

Get Heirarchical Condition Codes (HCC)
icd_classes_conflict

Check whether there are any ICD class conflicts
icd_guess_short_update

Guess short vs decimal of ICD and update class
icd_sort

Sort short-form ICD-9 codes
icd_expand_range

take two ICD-9 codes and expand range to include all child codes
icd_expand_range.icd10cm

Expand range of ICD-10 codes returning only defined codes in ICD-10-CM
icd_guess_version

Guess version of ICD codes
rtf_parse_year

parse RTF description of entire ICD-9-CM for a specific year
rtf_parse_lines

parse lines of RTF
strim

Trim leading and trailing white space from a single string
strip

Strip character(s) from character vector
uranium_pathology

United States Transuranium & Uranium Registries
vec_to_env_true

create environment from vector
icd_count_codes_wide

Count ICD codes given in wide format
icd_count_comorbid

Count number of comorbidities per patient
icd_guess_pair_version

Guess the ICD version (9 or 10) from a pair of codes
icd_is_valid.default

Test whether an ICD code is major
icd_expand_range_major

Expand major codes to range
regexec32

regexec which accepts perl argument even in older R
icd_explain

Explain ICD-9 and ICD-10 codes in English
rtf_fetch_year

Fetch RTF for a given year
icd_long_to_wide

Convert ICD data from long to wide format
rtf_generate_fourth_lookup

generate look-up for four digit codes
logical_to_binary

Encode TRUE as 1, and FALSE as 0 (integers)
icd_get_billable

Get billable ICD codes
icd_generate_sysdata

Generate sysdata.rda
rtf_lookup_fourth

apply fourth digit qualifiers
icd_guess_short

Guess whether codes are short_code or decimal_code
shortcode_icd9

set icd_short_to_decimal attribute
[[.icd_comorbidity_map

Extract vector of codes from an ICD comorbidity map
icd_guess_version_update

Guess version of ICD and update class
str_extract

TODO does stringr do this?
icd_is_major

Check whether a code is major
icd_is_valid

Check whether ICD-9 codes are syntactically valid
subset_icd

extract subset from ICD data
icd_poa_choices

Present-on-admission flags
icd_short_to_decimal

Convert ICD codes from short to decimal forms
lookupComorbidByChunkFor

core search for ICD code in a map
icd_in_reference_code

match ICD9 codes
random_string

generate random strings
icd_update_everything

generate all package data
named_list

make a list using input argument names as names
icd_van_walraven

Calculate van Walraven Elixhauser Score
parse_leaf_desc_icd9cm_v27

Parse billable codes for ICD-9-CM version 27
re_just

Limit a regular expression to just what is given
rtf_fix_quirks_2015

fix quirks for 2015 RTF parsing
rtf_fix_unicode

Fix Unicode characters in RTF
str_match_all

return all matches for regular expression
rtf_filter_excludes

exclude some unwanted rows from filtered RTF
rtf_fix_duplicates

fix duplicates detected in RTF parsing
rtf_strip

Strip RTF
vermont_dx

Hospital discharge data from Vermont
str_pair_match

Match pairs of strings to get named vector
sas_extract_let_strings

Extract quoted or unquoted SAS string definitions
sas_format_extract

Extract assignments from a SAS FORMAT definition
rtf_main_filter

filter RTF for actual ICD-9 codes
rtf_parse_fifth_digit_range

parse a row of RTF source data for ranges to apply fifth digit
save_in_data_dir

Save given variable in package data directory
set_icd_class

Construct ICD-9 data types
swap_names_vals

swap names and values of a vector
sas_parse_assignments

Get assignments from a character string strings
unzip_single

unzip a single file from URL
unzip_to_data_raw

Unzip file to data-raw
trim

Trim leading and trailing white space
cr

sequence columns of comorbidities
factor_nosort

Fast Factor Generation
fastIntToString

Fast convert integer vector to character vector
icd10_chapters

ICD-10 chapters
icd10_comorbid_parent_search

find ICD-10 comorbidities by checking parents