Output from different medication extraction systems is formatted in different ways.
In order to be able to process the extracted information, we first need to convert
the output from different systems into a standardized format. Extracted expressions
for various drug entities (e.g., drug name, strength, frequency, etc.) each receive
their own column formatted as "extracted expression::start position::stop position".
If multiple expressions are extracted for the same entity, they will be separated by
backticks.
MedXN output files anchor extractions to a specific drug name extraction.
In MedXN output files, the results from multiple clinical notes can be combined into
a single output file. The beginning of some lines of the output file can indicate
when output for a new observation (or new clinical note) begins. The user should specify
the argument begText
to be a regular expression used to identify the lines where output
for a new clinical note begins.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.