Rcpp
approach to comorbidity assignment with
OpenMP and vector of integers strategy. It is very fast, and most time is
now spent setting up the data to be passed in.
This is the main function which extracts comorbidities from a set of ICD-9
codes. This is when some trivial post-processing of the comorbidity data is
done, e.g. renaming to human-friendly field names, and updating fields
according to rules. The exact fields from the original mappings can be
obtained using hierarchy = FALSE
, but for comorbidity counting,
Charlson Score, etc., the rules should be applied.
icd9ComorbidShortCpp(icd9df, icd9Mapping, visitId, icd9Field, threads = 8L,
chunk_size = 256L, omp_chunk_size = 1L, aggregate = TRUE)icd_comorbid(x, map, ...)
icd10_comorbid(x, map, visit_name = NULL, icd_name = NULL,
short_code = NULL, short_map = icd_guess_short(map), return_df = FALSE,
...)
icd9_comorbid(x, map, visit_name = NULL, icd_name = NULL,
short_code = icd_guess_short(x, icd_name = icd_name),
short_map = icd_guess_short(map), return_df = FALSE, ...)
icd_comorbid_common(x, map, visit_name = NULL, icd_name, short_code,
short_map, return_df = FALSE, ...)
icd9_comorbid_ahrq(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_ahrq(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_quan_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_elix(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd10_comorbid_quan_deyo(x, ..., abbrev_names = TRUE, hierarchy = TRUE)
icd9_comorbid_hcc(x, date_name = "date", visit_name = NULL,
icd_name = NULL)
icd10_comorbid_hcc(x, date_name = "date", visit_name = NULL,
icd_name = NULL)
icd_comorbid_ahrq(x, icd_name = get_icd_name(x), ...)
icd_comorbid_elix(x, icd_name = get_icd_name(x), ...)
icd_comorbid_quan_elix(x, icd_name = get_icd_name(x), ...)
icd_comorbid_quan_deyo(x, icd_name = get_icd_name(x), ...)
icd_comorbid_hcc(x, icd_name = get_icd_name(x), ...)
single logical value, if TRUE
, then take (possible
much) more time to aggregate out-of-sequence visit IDs in the input
data.frame. If this is FALSE
, then each contiguous group of visit
IDs will result in a row of comorbidities in the output data. If you know
whether your visit IDs are disordered, then use TRUE
.
list (or name of a list if character vector of length one is given
as argument) of the comorbidities with each top-level list item containing
a vector of decimal ICD-9 codes. This is in the form of a list, with the
names of the items corresponding to the comorbidities (e.g. 'HTN', or
'diabetes') and the contents of each list item being a character vector of
short-form (no decimal place but ideally zero left-padded) ICD-9 codes. No
default: user should prefer to use the derivative functions, e.g.
icd_comorbid_ahrq
, since these also provide appropriate naming for
the fields, and squashing the hierarchy (see hierarchy
below)
The name of the column in the data frame which contains the
patient or visit identifier. Typically this is the visit identifier, since
patients come leave and enter hospital with different ICD-9 codes. It is a
character vector of length one. If left empty, or NULL
, then an
attempt is made to guess which field has the ID for the patient encounter
(not a patient ID, although this can of course be specified directly). The
guesses proceed until a single match is made. Data frames may be wide with
many matching fields, so to avoid false positives, anything but a single
match is rejected. If there are no successful guesses, and visit_id
was not specified, then the first column of the data frame is used.
The column in the data.frame
which contains the ICD
codes. This is a character vector of length one. If it is NULL
,
icd9
will attempt to guess the column name, looking for
progressively less likely possibilities until it matches a single column.
Failing this, it will take the first column in the data frame. Specifying
the column using this argument avoids the guesswork.
single logical value which determines whether the ICD-9
code provided is in short (TRUE
) or decimal (FALSE
) form.
Where reasonable, this is guessed from the input data.
Same as short, but applied to map
instead of
icd_df
. All the codes in a mapping should be of the same type, i.e.
short or decimal.
single logical value that defaults to TRUE
, in
which case the shorter human-readable names stored in e.g.
ahrqComorbidNamesAbbrev
are applied to the data frame column names.
single logical value that defaults to TRUE
, in which
case the hierarchy defined for the mapping is applied. E.g. in Elixhauser,
you can't have uncomplicated and complicated diabetes both flagged.
column representing, the date each record took place, as in each year there is a different ICD9/10 to CC mapping). This is only necessary for HCC mappings.
icd10_comorbid
: ICD-10 comorbidities
icd9_comorbid
: Get comorbidities from data.frame
of ICD-9
codes
For ICD-10 codes, this method, it relies on exact matching, but not every of billions of possible ICD-10/ICD-10-CM codes are included in the mappings, so it will likely give incomplete results, without searching for parents of the input codes until a match is found in the map.
There is a change in behavior from previous versions. The visit_name
column is (implicitly) sorted by using std::set container. Previously, the
visit_name output order was whatever R's aggregate
produced.
The threading of the C++ can be controlled using e.g.
option(icd.threads = 4)
. If it is not set, the number of cores in
the machine is used.
The common comorbidity calculation code does not depend on ICD type. There is some type conversion so the map and input codes are all in 'short' format, fast factor generation, then fast comorbidity assignment.
data.frame
s of patient data may have columns within them
which are of class icd9
, icd10
etc., but do not themselves
have a class: therefore, the S3 mechanism for dispatch is not suitable. I
may add a wrapper function which looks inside a data.frame
of
comorbidities, and dispatches to the appropriate function, but right now
the user must call the icd9_
or icd10_
prefixed function
directly.
Applying CMS Hierarchical Condition Categories
icd_comorbid_hcc
functions differently from the rest of the
comorbidity assignment functions. This is because CMS publishes a specific
ICD to Condition Category mapping including all child ICD codes. In
addition, while these mappings were the same for 2007-2012, after 2013
there are annual versions. In addition, there is a many:many linkage
between ICD and Condition Categories (CC). Once CCs are assigned, a series
of hierarchy rules (which can also change annually) are applied to create
HCCs.
# NOT RUN {
pts <- icd_long_data(visit_name = c("2", "1", "2", "3", "3"),
icd9 = c("39891", "40110", "09322", "41514", "39891"))
icd_comorbid(pts, icd9_map_ahrq, short_code = TRUE) # visit_name is now sorted
pts <- icd_long_data(
visit_name = c("1", "2", "3", "4", "4"),
icd_name = c("20084", "1742", "30410", "41514", "95893"),
date = as.Date(c("2011-01-01", "2011-01-02", "2011-01-03",
"2011-01-04", "2011-01-04")))
pt_hccs <- icd_comorbid_hcc(pts, date_name = "date")
# }
Run the code above in your browser using DataLab