Learn R Programming

⚠️There's a newer version (4.0.9) of this package.Take me there.

icd

icd statistics, based on Rstudio mirror

Old package icd9 statistics

ICD-9 and ICD-10 comorbidities, manipulation and validation

Features

  • find comorbidities of patients based on admission or discharge ICD-9 or ICD-10 codes, e.g. Cancer, Heart Disease
    • several standard mappings of ICD codes to comorbidities are included (Quan, Deyo, Elixhauser, AHRQ)
    • very fast assignment of ICD codes to comorbidities (using C and C++ internally, with automatic parallel execution using OpenMP when available), assigning millions of comorbidities in a few seconds
  • Charlson and Van Walraven score calculations
  • Hierarchical Condition Codes (HCC)
  • validation of ICD codes from different annual revisions of ICD-9-CM and ICD-10-CM
  • summarizing ICD codes into groups, and to human-readable descriptions
  • correct conversion between different representations of ICD codes, with and without a decimal points, leading and trailing characters (this is not trivial for ICD-9-CM). ICD-9 to ICD-10 conversion is left as an exercise for the user!
  • comprehensive test suite to increase confidence in accurate processing of ICD codes

Introduction

Calculate comorbidities, Charlson scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. Common ambiguities and code formats are handled. This package enables a work flow from raw lists of ICD codes in hospital billing databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. This package replaces icd9, which should be uninstalled.

Relevance

ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.

Comorbidities

A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.

ICD-9 codes

ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.

ICD-10 codes

ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.

Examples

See also the vignettes and examples embedded in the help for each function for more. Here's a taste:

patient_data
#>   visit_id  icd9  poa
#> 1     1000 40201    Y
#> 2     1000  2258 <NA>
#> 3     1000  7208    N
#> 4     1000 25001    Y
#> 5     1001 34400    X
#> 6     1001  4011    Y
#> 7     1002  4011    E

# reformat input data as needed
icd_long_to_wide(patient_data)
#>      [,1]    [,2]   [,3]   [,4]   
#> 1000 "40201" "2258" "7208" "25001"
#> 1001 "34400" "4011" NA     NA     
#> 1002 "4011"  NA     NA     NA

# get comorbidities using Quan's application of Deyo's Charlson comorbidity groups
icd_comorbid_quan_deyo(patient_data)
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1002 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE      TRUE FALSE  FALSE       FALSE FALSE FALSE
#> 1002 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

# find diagnoses present on admission:
icd_filter_poa(patient_data)
#>   visit_id  icd9
#> 1     1000 40201
#> 4     1000 25001
#> 6     1001  4011

# get comorbidities based on present-on-arrival diagnoses, use magrittr to flow the data
patient_data %>% icd_filter_poa %>% icd_comorbid_quan_deyo
#>         MI   CHF   PVD Stroke Dementia Pulmonary Rheumatic   PUD LiverMild
#> 1000 FALSE  TRUE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#> 1001 FALSE FALSE FALSE  FALSE    FALSE     FALSE     FALSE FALSE     FALSE
#>         DM  DMcx Paralysis Renal Cancer LiverSevere  Mets   HIV
#> 1000  TRUE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE
#> 1001 FALSE FALSE     FALSE FALSE  FALSE       FALSE FALSE FALSE

Look at the help files for details and examples of almost every function in this package.

?icd_is_valid
?icd_comorbid

Note that reformatting from wide to long and back is not as straightforward as using the various Hadley Wickham tools for doing this: knowing the more detailed structure of the data let's us do this better for the case of dealing with ICD codes.

Install

The latest version is available in github icd, and can be installed with:

    install.packages("devtools")
    devtools::install_github("jackwasey/icd")

The master branch at github should always build and pass all tests and R CMD check, and will be similar or identical to the most recent CRAN release. The CRAN releases are stable milestones. Contributions and bug reports are encouraged and essential for this package to remain current and useful to the many people who have installed it.

Advanced

Source Data and SAS format files

In the spirit of reproducible research, all the R data files in this package can be recreated from source. The size of the source files makes it cumbersome to include them in the R package available on CRAN. Using the github source, you can pull the original data and SAS format files, and rebuild the data; or use the tools provided by this package to update the data using new source data files, e.g. when ICD-10-CM 2017 is released.

Doing the parsing requires additional dependencies, which are not gratuitously included in the package requirements, since most users won't need them. Benchmarking this package also has additional requirements. These are: - xml2 - ggplot2 - digest

Automated testing

One of the strengths of this package is a thorough test suite, including over 10,000 lines of testing code.

find tests -type f -exec cat '{}' + | wc -l
10910

A better metric of testing and code quality is code coverage, for which codecov and coveralls are used. The automated wercker builds report test coverage results to codecov, whereas the travis builds report coverage to coveralls. The parsing code is a significant chunk of code, and may or may not be included in the automated builds depending on whether the source data is available. With the data available, test coverage is >95%.

Copy Link

Version

Monthly Downloads

28

Version

2.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Jack O Wasey

Last Published

May 15th, 2017

Functions in icd (2.2)

cr

sequence columns of comorbidities
env_to_vec_flip

return a new environment with names and values swapped
combine

combine ICD codes
condense_explain_table

condense icd_explain_table output down to major codes
fastIntToString

Fast convert integer vector to character vector
fixSubchapterNa

Fix NA sub-chapters in RTF parsing
icd10_comorbid_parent_search

find ICD-10 comorbidities by checking parents
icd10_comorbid_parent_search_cpp

Internal function to find ICD-10 parents
apply_hier

Apply hierarchy and choose naming for each comorbidity map
.attr_short_diag

Change whether ICD code has short or long attribute
expect_chap_equal

expect named sub-chapter has a given range, case insensitive
expect_chap_present

expect that a chapter with given title exists, case-insensitive
as_char_no_warn

convert to character vector without warning
chapter_to_desc_range

Parse a (sub)chapter text description with parenthesised range
icd10_generate_map_quan_deyo

Generate Quan mapping for Charlson categories of ICD-10 codes
icd10_generate_map_quan_elix

generate ICD-10 Quan/Elixhauser mapping
icd10cm2016

ICD-10-CM
icd10cm_extract_sub_chapters

Get sub-chapters from the 2016 XML for ICD-10-CM
icd9RandomShortN

Generate random short-form ICD-9 codes
generate_spelling

Generate spelling exceptions
generate_uranium_pathology

generate uranium pathology data
generate_vermont_dx

generate vermont_dx data
get_non_ASCII

mimic the R CMD check test
icd10_parse_cc

Import the ICD10 to CC crosswalks
icd10_sub_chapters

ICD-10 sub-chapters
icd9_parse_quan_deyo_sas

parse original SAS code defining Quan's update of Deyo comorbidities.
icd9_sources

ICD-9 data sources
icd_children_defined

defined children of ICD codes
icd_classes_conflict

Check whether there are any ICD class conflicts
icd_short_to_parts.icd10

Convert decimal ICD codes to component parts
icd_decimal_to_short

Convert Decimal format ICD codes to short format
icd9_add_leading_zeroes_cpp

Add leading zeroes to incomplete ICD-9 codes
icd9_is_n_cpp

Do elements of vector begin with V, E (or any other character)?
icd9_map_ahrq

AHRQ comorbidities
icd9cm_get_billable

Get billable ICD-9-CM codes
condense_explain_table_worker

generate condensed code and condensed number columns
icd9PartsToShort

Convert ICD9 codes between formats and structures.
expect_equal_no_icd

expect equal, ignoring any ICD classes
factor_nosort

Fast Factor Generation
icd_expand_range_major

Expand major codes to range
icd_explain

Explain ICD-9 and ICD-10 codes in English
icd_guess_version_update

Guess version of ICD and update class
icd_in_reference_code

match ICD9 codes
icd_poa_choices

Present-on-admission flags
icd9cm_hierarchy

Latest ICD-9-CM diagnosis codes, in flat data.frame format
icd_charlson

Calculate Charlson Comorbidity Index (Charlson Score)
icd_children

Get children of ICD codes
icd-package

icd: Tools for Working with ICD-9 and ICD-10 Codes, and Finding Comorbidities
icd10_chapters

ICD-10 chapters
icd10cm_get_all_defined

get all ICD-10-CM codes
icd9AppendMinors

append minor to major using std
generate_random_short_icd10cm_bill

generate random ICD-9 codes
generate_random_short_icd9

generate random ICD-9 codes
get_visit_name

Get or guess the name of the visit ID column
%eine%

in/match equivalent for two Environment arguments
icd_short_to_decimal

Convert ICD codes from short to decimal forms
rtf_fix_unicode

Fix Unicode characters in RTF
rtf_generate_fourth_lookup

generate look-up for four digit codes
sas_parse_assignments

Get assignments from a character string strings
icd9_extract_alpha_numeric

extract alphabetic, and numeric part of ICD-9 code prefix
icd9_fetch_ahrq_sas

get the SAS code from AHRQ
icd9_map_quan_deyo

Quan adaptation of Deyo/Charlson comorbidities
icd9_map_quan_elix

Quan adaptation of Elixhauser comorbidities
icd9MajMinToCode

Convert mjr and mnr vectors to single code
icd9MajMinToShortStd

initialize a std::vector of strings with repeated value of the minor
icd9_drop_leading_zeroes

drop zero padding from decimal ICD-9 code.
icd9_expand_range_worker

expand range worker
icd9_generate_map_quan_elix

Generate Quan's revised Elixhauser comorbidities
icd9ChildrenShort11

Find child codes from vector of ICD-9 codes.
icd9ChildrenShortStd

C++ implementation of finding children of short codes
icd9_chapters

ICD-9 chapters
icd9_chapters_to_map

convert the chapter headings to lists of codes
icd9_generate_all_

generate lookup data for each class of ICD-9 code
icd9_generate_map_elix

Generate Elixhauser comorbidities
icd9_map_elix

Elixhauser comorbidities
icd9_map_hcc

Medicare Hierarchical Condition Categories
save_in_data_dir

Save given variable in package data directory
setup_test_check

Set-up test options
shortcode_icd9

set icd_short_to_decimal attribute
swap_names_vals

swap names and values of a vector
trim

Trim leading and trailing white space
icd9_parse_cc

Generate ICD to HCC Crosswalks from CMS
icd9_parse_leaf_desc_ver

Read the ICD-9-CM description data as provided by the Center for Medicaid Services (CMS).
icd9cm_hierarchy_hotfix

fix some RTF parsing errors
icd9cm_latest_edition

Latest ICD-9-CM edition
icd_count_codes_wide

Count ICD codes given in wide format
icd_count_comorbid

Count number of comorbidities per patient
icd_explain_table

Explain ICD-9 and ICD-10 codes in English from decimal (123.45 style), Tabulates the decimal format alongside converted non-decimal format.
icd_explain_table_worker

generate table of ICD code explanations
icd_guess_short_update

Guess short vs decimal of ICD and update class
icd_guess_version

Guess version of ICD codes
icd_filter_poa

Filters data frame based on present-on-arrival flag
icd_filter_valid

Filter ICD codes by validity.
icd_get_major.icd9

Get major part of an ICD code
icd_get_valid

invalid subset of decimal or short_code ICD-9 codes
icd9_generate_sources

generate data for finding source data for ICD-9-CM
icd9cm_billable

list of annual versions of billable leaf nodes of ICD-9-CM
icd9cm_generate_chapters_hierarchy

generate ICD-9-CM hierarchy
icd_classes_ordered

prefer an order of classes
icd_is_valid

Check whether ICD-9 codes are syntactically valid
icd_wide_to_long

Convert ICD data from wide to long format
icd9ComorbidShortCpp

Find comorbidities from ICD-9 codes.
icd_diff_comorbid

show the difference between two comorbidity mappings
icd_expand_minor

expand decimal part of ICD-9 code to cover all possible sub-codes
icd_is_billable

Determine whether codes are billable leaf-nodes
icd_is_major

Check whether a code is major
is.icd_short_diag

test ICD-related classes
rtf_fix_duplicates

fix duplicates detected in RTF parsing
rtf_fix_quirks_2015

fix quirks for 2015 RTF parsing
icd_is_defined

Check whether ICD-9 codes exist
icd_update_everything

generate all package data
icd_van_walraven

Calculate van Walraven Elixhauser Score
rtf_fetch_year

Fetch RTF for a given year
rtf_lookup_fourth

apply fourth digit qualifiers
rtf_main_filter

filter RTF for actual ICD-9 codes
icd_expand_range

take two ICD-9 codes and expand range to include all child codes
icd_expand_range.icd10cm

Expand range of ICD-10 codes returning only defined codes in ICD-10-CM
icd_generate_sysdata

Generate sysdata.rda
rtf_filter_excludes

exclude some unwanted rows from filtered RTF
rtf_parse_fifth_digit_range

parse a row of RTF source data for ranges to apply fifth digit
rtf_parse_lines

parse lines of RTF
show_test_options

Show options which control testing
str_extract

TODO does stringr do this?
vermont_dx

Hospital discharge data from Vermont
icd_is_valid.default

Test whether an ICD code is major
icd_long_to_wide

Convert ICD data from long to wide format
logical_to_binary

Encode TRUE as 1, and FALSE as 0 (integers)
lookupComorbidByChunkFor

core search for ICD code in a map
icd_get_billable

Get billable ICD codes
icd_short_to_parts

Convert short format ICD codes to component parts
icd_sort

Sort short-form ICD-9 codes
parse_leaf_desc_icd9cm_v27

Parse billable codes for ICD-9-CM version 27
print.icd_comorbidity_map

Print a comorbidity map
random_string

generate random strings
rtf_parse_year

parse RTF description of entire ICD-9-CM for a specific year
rtf_strip

Strip RTF
icd9_get_chapters

get ICD-9 Chapters from vector of ICD-9 codes
icd9_is_n

do ICD-9 codes belong to numeric, V or E sub-types?
icd9_order_short

Get order of short-form ICD-9 codes
icd9_parse_ahrq_sas

parse AHRQ SAS code to get mapping
strim

Trim leading and trailing white space from a single string
strip

Strip character(s) from character vector
uranium_pathology

United States Transuranium & Uranium Registries
vec_to_env_true

create environment from vector
parse_leaf_descriptions_all

Get billable codes from all available years
re_just

Limit a regular expression to just what is given
regexec32

regexec which accepts perl argument even in older R
icd_comorbid_df_to_mat

convert comorbidity matrix to data frame
icd_comorbid_mat_to_df

convert comorbidity data frame from matrix
icd_condense

Condense ICD-9 code by replacing complete families with parent codes
icd_count_codes

Count ICD codes or comorbidities for each patient
icd_get_defined

Select only defined ICD codes
icd_get_invalid

Get invalid ICD codes
icd_guess_pair_version

Guess the ICD version (9 or 10) from a pair of codes
icd_guess_short

Guess whether codes are short_code or decimal_code
icd_names_elix

Comorbidity names
icd_parse_cc_hierarchy

Import CMS HCC Rules
my_test_check

run testtthat::test_check with a Perl regular expression filter
named_list

make a list using input argument names as names
str_match_all

return all matches for regular expression
str_pair_match

Match pairs of strings to get named vector
[[.icd_comorbidity_map

Extract vector of codes from an ICD comorbidity map
sas_extract_let_strings

Extract quoted or unquoted SAS string definitions
sas_format_extract

Extract assignments from a SAS FORMAT definition
set_full_test_options

Set test options to do everything
set_icd_class

Construct ICD-9 data types
unzip_single

unzip a single file from URL
unzip_to_data_raw

Unzip file to data-raw
subset_icd

extract subset from ICD data