RcppParallel approach with openmp and vector of integer strategy
This is the main function which extracts co-morbidities from a
set of ICD-9 codes. This is when some trivial post-processing of the
comorbidity data is done, e.g. renaming to human-friendly field names, and
updating fields according to rules. The exact fields from the original
mappings can be obtained using applyHierarchy = FALSE
, but for
comorbidity counting, Charlson Score, etc., the rules should be applied.
For Charlson/Deyo comorbidities, strictly speaking, there is no dropping of more e.g. uncomplicated DM if complicated DM exists, however, this is probaably useful, in general and is essential when calculating the Charlson score.
icd9ComorbidShortCpp(icd9df, icd9Mapping, visitId, icd9Field, threads = 8L,
chunkSize = 256L, ompChunkSize = 1L, aggregate = TRUE)icd9Comorbid(icd9df, icd9Mapping, visitId = NULL, icd9Field = NULL,
isShort = icd9GuessIsShort(icd9df[[icd9Field]]),
isShortMapping = icd9GuessIsShort(icd9Mapping), return.df = FALSE, ...)
icd9ComorbidShort(...)
icd9ComorbidAhrq(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidQuanDeyo(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidQuanElix(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9ComorbidElix(..., abbrevNames = TRUE, applyHierarchy = TRUE)
icd9Comorbidities(...)
icd9ComorbiditiesAhrq(...)
icd9ComorbiditiesElixHauser(...)
icd9ComorbiditiesQuanDeyo(...)
icd9ComorbiditiesQuanElixhauser(...)
data frame containing columns for visitId (which is the feault name), icd9 (default for the icd9 code), and maybe also a POA flag.
list (or name of a list if character vector of length one
is given as argument) of the comorbidities with each top-level list item
containing a vector of decimal ICD9 codes. This is in the form of a list,
with the names of the items corresponding to the comorbidities (e.g. "HTN",
or "diabetes") and the contents of each list item being a character vector
of short-form (no decimal place but ideally zero left-padded) ICD-9 codes.
No default: user should prefer to use the derivative functions, e.g.
icd9ComorbidAhrq, since these also provide appropriate naming for the
fields, and squashing the hierarchy (see applyHierarchy
below)
The name of the column in the data frame which contains the
patient or visit identifier. Typically this is the visit identifier, since
patients come leave and enter hospital with different ICD-9 codes. It is a
character vector of length one. If left empty, or NULL
, then an
attempt is made to guess which field has the ID for the patient encounter
(not a patient ID, although this can of course be specified directly). The
guesses proceed until a single match is made. Data frames may be wide with
many matching fields, so to avoid false positives, anything but a single
match is rejected. If there are no successful guesses, and visitId
was not specified, then the first column of the data frame is used.
The column in the data frame which contains the ICD codes.
This is a character vector of length one. If it is NULL
, icd9
will attempt to guess the column name, looking for progressively less
likely possibilities until it matche a single column. Failing this, it will
take the first column in the data frame. Specifying the column using this
argument avoids the guesswork.
single logical value, if /codeTRUE, then take (possible much) more time to aggregate out-of-sequence visit IDs in the icd9df data.frame. If this is FALSE
, then each contiguous group of visit IDs will result in a row of comorbidities in the output data. If you know your visitIds are possible disordered, then use TRUE
.
single logical value which determines whether the ICD-9 code provided is in short (TRUE) or decimal (FALSE) form. Where reasonable, this is guessed from the input data.
Same as isShort, but applied to icd9Mapping
instead of icd9df
. All the codes in a mapping should be of the same
type, i.e. short or decimal.
further arguments e.g. chunkSize and ompChunkSize pass to the C++ function
single locical value that defaults to TRUE
, in
which case the ishorter human-readable names stored in e.g.
ahrqComorbidNamesAbbrev
are applied to the data frame column names.
single logical value that defaults to TRUE
, in
which case the hierarchy defined for the mapping is applied. E.g. in
Elixhauser, you can't have uncomplicated and complicated diabetes both
flagged.
arguments passed to the corresponding function from the alias.
E.g. all the arguments passed to icd9ComorbiditiesAhrq
are passed on
to icd9ComorbidAhrq
There is a change in behavior from previous versions. The visitId
column is (implicitly) sorted by using std::set container. Previously, the
visitId output order was whatever R's aggregate
produced.
The threading of the C++ can be controlled using e.g.
option(icd9.threads = 4)
. If it is not set, the number of cores in
the machine is used.
# NOT RUN {
pts <- data.frame(visitId = c("2", "1", "2", "3", "3"),
icd9 = c("39891", "40110", "09322", "41514", "39891"))
icd9ComorbidShort(pts, ahrqComorbid) # visitId is now sorted
# }
Run the code above in your browser using DataLab