icd9ComorbidShortCpp: Find comorbidities from ICD-9 codes.

Description

Rcpp approach to comorbidity assignment with OpenMP and vector of integers strategy. It is very fast, and most time is now spent setting up the data to be passed in.

This is the main function which extracts comorbidities from a set of ICD-9 codes. This is when some trivial post-processing of the comorbidity data is done, e.g. renaming to human-friendly field names, and updating fields according to rules. The exact fields from the original mappings can be obtained using hierarchy = FALSE, but for comorbidity counting, Charlson Score, etc., the rules should be applied.

Usage

icd9ComorbidShortCpp(icd9df, icd9Mapping, visitId, icd9Field, threads = 8L,
  chunk_size = 256L, omp_chunk_size = 1L, aggregate = TRUE)
icd_comorbid(x, map, ...)
icd10_comorbid(x, map, visit_name = NULL, icd_name = NULL,
  short_code = NULL, short_map = icd_guess_short(map), return_df = FALSE,
  ...)
icd9_comorbid(x, map, visit_name = NULL, icd_name = NULL,
  short_code = icd_guess_short(x, icd_name = icd_name),
  short_map = icd_guess_short(map), return_df = FALSE, ...)
icd_comorbid_common(x, map, visit_name = NULL, icd_name, short_code,
  short_map, return_df = FALSE, ...)
icd9_comorbid_ahrq(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_ahrq(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_quan_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_hcc(x, date_name = "date", visit_name = NULL,
  icd_name = NULL)
icd10_comorbid_hcc(x, date_name = "date", visit_name = NULL,
  icd_name = NULL)
icd_comorbid_ahrq(x, icd_name = get_icd_name(x), ...)
icd_comorbid_elix(x, icd_name = get_icd_name(x), ...)
icd_comorbid_quan_elix(x, icd_name = get_icd_name(x), ...)
icd_comorbid_quan_deyo(x, icd_name = get_icd_name(x), ...)
icd_comorbid_hcc(x, icd_name = get_icd_name(x), ...)

Arguments

aggregate

single logical value, if TRUE, then take (possible much) more time to aggregate out-of-sequence visit IDs in the input data.frame. If this is FALSE, then each contiguous group of visit IDs will result in a row of comorbidities in the output data. If you know whether your visit IDs are disordered, then use TRUE.

map

list (or name of a list if character vector of length one is given as argument) of the comorbidities with each top-level list item containing a vector of decimal ICD-9 codes. This is in the form of a list, with the names of the items corresponding to the comorbidities (e.g. 'HTN', or 'diabetes') and the contents of each list item being a character vector of short-form (no decimal place but ideally zero left-padded) ICD-9 codes. No default: user should prefer to use the derivative functions, e.g. icd_comorbid_ahrq, since these also provide appropriate naming for the fields, and squashing the hierarchy (see hierarchy below)

visit_name

The name of the column in the data frame which contains the patient or visit identifier. Typically this is the visit identifier, since patients come leave and enter hospital with different ICD-9 codes. It is a character vector of length one. If left empty, or NULL, then an attempt is made to guess which field has the ID for the patient encounter (not a patient ID, although this can of course be specified directly). The guesses proceed until a single match is made. Data frames may be wide with many matching fields, so to avoid false positives, anything but a single match is rejected. If there are no successful guesses, and visit_id was not specified, then the first column of the data frame is used.

icd_name

The column in the data.frame which contains the ICD codes. This is a character vector of length one. If it is NULL, icd9 will attempt to guess the column name, looking for progressively less likely possibilities until it matches a single column. Failing this, it will take the first column in the data frame. Specifying the column using this argument avoids the guesswork.

short_code

single logical value which determines whether the ICD-9 code provided is in short (TRUE) or decimal (FALSE) form. Where reasonable, this is guessed from the input data.

short_map

Same as short, but applied to map instead of icd_df. All the codes in a mapping should be of the same type, i.e. short or decimal.

abbrev_names

single logical value that defaults to TRUE, in which case the shorter human-readable names stored in e.g. ahrqComorbidNamesAbbrev are applied to the data frame column names.

hierarchy

single logical value that defaults to TRUE, in which case the hierarchy defined for the mapping is applied. E.g. in Elixhauser, you can't have uncomplicated and complicated diabetes both flagged.

date

column representing, the date each record took place, as in each year there is a different ICD9/10 to CC mapping). This is only necessary for HCC mappings.

Functions

icd10_comorbid: ICD-10 comorbidities
icd9_comorbid: Get comorbidities from data.frame of ICD-9 codes

Details

For ICD-10 codes, this method, it relies on exact matching, but not every of billions of possible ICD-10/ICD-10-CM codes are included in the mappings, so it will likely give incomplete results, without searching for parents of the input codes until a match is found in the map.

There is a change in behavior from previous versions. The visit_name column is (implicitly) sorted by using std::set container. Previously, the visit_name output order was whatever R's aggregate produced.

The threading of the C++ can be controlled using e.g. option(icd.threads = 4). If it is not set, the number of cores in the machine is used.

The common comorbidity calculation code does not depend on ICD type. There is some type conversion so the map and input codes are all in 'short' format, fast factor generation, then fast comorbidity assignment.

data.frames of patient data may have columns within them which are of class icd9, icd10 etc., but do not themselves have a class: therefore, the S3 mechanism for dispatch is not suitable. I may add a wrapper function which looks inside a data.frame of comorbidities, and dispatches to the appropriate function, but right now the user must call the icd9_ or icd10_ prefixed function directly.

Applying CMS Hierarchical Condition Categories icd_comorbid_hcc functions differently from the rest of the comorbidity assignment functions. This is because CMS publishes a specific ICD to Condition Category mapping including all child ICD codes. In addition, while these mappings were the same for 2007-2012, after 2013 there are annual versions. In addition, there is a many:many linkage between ICD and Condition Categories (CC). Once CCs are assigned, a series of hierarchy rules (which can also change annually) are applied to create HCCs.

Examples

Run this code

# NOT RUN {
  pts <- icd_long_data(visit_name = c("2", "1", "2", "3", "3"),
                   icd9 = c("39891", "40110", "09322", "41514", "39891"))
  icd_comorbid(pts, icd9_map_ahrq, short_code = TRUE) # visit_name is now sorted
  pts <- icd_long_data(
             visit_name = c("1", "2", "3", "4", "4"),
             icd_name = c("20084", "1742", "30410", "41514", "95893"),
             date = as.Date(c("2011-01-01", "2011-01-02", "2011-01-03",
               "2011-01-04", "2011-01-04")))
  pt_hccs <- icd_comorbid_hcc(pts, date_name = "date")
# }

Run the code above in your browser using DataLab