clinspacy
The goal of clinspacy is to perform biomedical named entity recognition, Unified Medical Language System (UMLS) concept mapping, and negation detection using the Python spaCy, scispacy, and medspacy packages.
Installation
You can install the GitHub version of clinspacy with:
remotes::install_github('ML4LHS/clinspacy', INSTALL_opts = '--no-multiarch')How to load clinspacy
library(clinspacy)Initiating clinspacy
Note: the very first time you run clinspacy_init() or clinspacy()
after installing the package, you may receive an error stating that
spaCy was unable to be imported because it was not found. Restarting
your R session should resolve the issue.
Initiating clinspacy is optional. If you do not initiate the package
using clinspacy_init(), it will be automatically initiated without the
UMLS linker. The UMLS linker takes up ~12 GB of RAM, so if you would
like to use the linker, you can initiate clinspacy with the linker. The
linker can still be added on later by reinitiating with the use_linker
argument set to
TRUE.
clinspacy_init() # This is optional! The default functionality is to initiatie clinspacy without the UMLS linkerNamed entity recognition (without the UMLS linker)
The clinspacy() function can take a single string, a character vector,
or a data frame. It can output either a data frame or a file name.
A single character string as input
clinspacy('This patient has diabetes and CKD stage 3 but no HTN.')
#> | | | 0% | |=========================================================================================================================| 100%
#> clinspacy_id entity lemma is_family is_historical is_hypothetical is_negated is_uncertain section_category
#> 1 1 patient patient FALSE FALSE FALSE FALSE FALSE <NA>
#> 2 1 diabetes diabetes FALSE FALSE FALSE FALSE FALSE <NA>
#> 3 1 CKD stage 3 ckd stage 3 FALSE FALSE FALSE FALSE FALSE <NA>
#> 4 1 HTN htn FALSE FALSE FALSE TRUE FALSE <NA>
clinspacy('HISTORY: He presents with chest pain. PMH: HTN. MEDICATIONS: This patient with diabetes is taking omeprazole, aspirin, and lisinopril 10 mg but is not taking albuterol anymore as his asthma has resolved. ALLERGIES: penicillin.', verbose = FALSE)
#> clinspacy_id entity lemma is_family is_historical is_hypothetical is_negated is_uncertain section_category
#> 1 1 chest pain chest pain FALSE TRUE FALSE FALSE FALSE <NA>
#> 2 1 PMH PMH FALSE FALSE FALSE FALSE FALSE past_medical_history
#> 3 1 HTN htn FALSE FALSE FALSE FALSE FALSE past_medical_history
#> 4 1 patient patient FALSE FALSE FALSE FALSE FALSE medications
#> 5 1 diabetes diabetes FALSE FALSE FALSE FALSE FALSE medications
#> 6 1 omeprazole omeprazole FALSE FALSE FALSE FALSE FALSE medications
#> 7 1 aspirin aspirin FALSE FALSE FALSE FALSE FALSE medications
#> 8 1 lisinopril lisinopril FALSE FALSE FALSE FALSE FALSE medications
#> 9 1 albuterol albuterol FALSE FALSE FALSE TRUE FALSE medications
#> 10 1 asthma asthma FALSE FALSE FALSE TRUE FALSE medications
#> 11 1 penicillin penicillin FALSE FALSE FALSE FALSE FALSE allergiesA character vector as input
clinspacy(c('This pt has CKD and HTN', 'Pt only has CKD but no HTN'),
verbose = FALSE)
#> clinspacy_id entity lemma is_family is_historical is_hypothetical is_negated is_uncertain section_category
#> 1 1 CKD ckd FALSE FALSE FALSE FALSE FALSE <NA>
#> 2 1 HTN htn FALSE FALSE FALSE FALSE FALSE <NA>
#> 3 2 Pt pt FALSE FALSE FALSE FALSE FALSE <NA>
#> 4 2 CKD ckd FALSE FALSE FALSE FALSE FALSE <NA>
#> 5 2 HTN htn FALSE FALSE FALSE TRUE FALSE <NA>A data frame as input
data.frame(text = c('This pt has CKD and HTN', 'Diabetes is present'),
stringsAsFactors = FALSE) %>%
clinspacy(df_col = 'text', verbose = FALSE)
#> clinspacy_id entity lemma is_family is_historical is_hypothetical is_negated is_uncertain section_category
#> 1 1 CKD ckd FALSE FALSE FALSE FALSE FALSE <NA>
#> 2 1 HTN htn FALSE FALSE FALSE FALSE FALSE <NA>
#> 3 2 Diabetes Diabetes FALSE FALSE FALSE FALSE FALSE <NA>Saving the output to file
The output_file can then be piped into bind_clinspacy() or
bind_clinspacy_embeddings(). This saves a lot of time because you can
try different strategies of subsetting in both of these functions
without needing to re-process the original data.
if (!dir.exists(rappdirs::user_data_dir('clinspacy'))) {
dir.create(rappdirs::user_data_dir('clinspacy'), recursive = TRUE)
}
mtsamples = dataset_mtsamples()
mtsamples[1:5,]
#> note_id description medical_specialty
#> 1 1 A 23-year-old white female presents with complaint of allergies. Allergy / Immunology
#> 2 2 Consult for laparoscopic gastric bypass. Bariatrics
#> 3 3 Consult for laparoscopic gastric bypass. Bariatrics
#> 4 4 2-D M-Mode. Doppler. Cardiovascular / Pulmonary
#> 5 5 2-D Echocardiogram Cardiovascular / Pulmonary
#> sample_name
#> 1 Allergic Rhinitis
#> 2 Laparoscopic Gastric Bypass Consult - 2
#> 3 Laparoscopic Gastric Bypass Consult - 1
#> 4 2-D Echocardiogram - 1
#> 5 2-D Echocardiogram - 2
#> transcription
#> 1 SUBJECTIVE:, This 23-year-old white female presents with complaint of allergies. She used to have allergies when she lived in Seattle but she thinks they are worse here. In the past, she has tried Claritin, and Zyrtec. Both worked for short time but then seemed to lose effectiveness. She has used Allegra also. She used that last summer and she began using it again two weeks ago. It does not appear to be working very well. She has used over-the-counter sprays but no prescription nasal sprays. She does have asthma but doest not require daily medication for this and does not think it is flaring up.,MEDICATIONS: , Her only medication currently is Ortho Tri-Cyclen and the Allegra.,ALLERGIES: , She has no known medicine allergies.,OBJECTIVE:,Vitals: Weight was 130 pounds and blood pressure 124/78.,HEENT: Her throat was mildly erythematous without exudate. Nasal mucosa was erythematous and swollen. Only clear drainage was seen. TMs were clear.,Neck: Supple without adenopathy.,Lungs: Clear.,ASSESSMENT:, Allergic rhinitis.,PLAN:,1. She will try Zyrtec instead of Allegra again. Another option will be to use loratadine. She does not think she has prescription coverage so that might be cheaper.,2. Samples of Nasonex two sprays in each nostril given for three weeks. A prescription was written as well.
#> 2 PAST MEDICAL HISTORY:, He has difficulty climbing stairs, difficulty with airline seats, tying shoes, used to public seating, and lifting objects off the floor. He exercises three times a week at home and does cardio. He has difficulty walking two blocks or five flights of stairs. Difficulty with snoring. He has muscle and joint pains including knee pain, back pain, foot and ankle pain, and swelling. He has gastroesophageal reflux disease.,PAST SURGICAL HISTORY:, Includes reconstructive surgery on his right hand 13 years ago. ,SOCIAL HISTORY:, He is currently single. He has about ten drinks a year. He had smoked significantly up until several months ago. He now smokes less than three cigarettes a day.,FAMILY HISTORY:, Heart disease in both grandfathers, grandmother with stroke, and a grandmother with diabetes. Denies obesity and hypertension in other family members.,CURRENT MEDICATIONS:, None.,ALLERGIES:, He is allergic to Penicillin.,MISCELLANEOUS/EATING HISTORY:, He has been going to support groups for seven months with Lynn Holmberg in Greenwich and he is from Eastchester, New York and he feels that we are the appropriate program. He had a poor experience with the Greenwich program. Eating history, he is not an emotional eater. Does not like sweets. He likes big portions and carbohydrates. He likes chicken and not steak. He currently weighs 312 pounds. Ideal body weight would be 170 pounds. He is 142 pounds overweight. If ,he lost 60% of his excess body weight that would be 84 pounds and he should weigh about 228.,REVIEW OF SYSTEMS: ,Negative for head, neck, heart, lungs, GI, GU, orthopedic, and skin. Specifically denies chest pain, heart attack, coronary artery disease, congestive heart failure, arrhythmia, atrial fibrillation, pacemaker, high cholesterol, pulmonary embolism, high blood pressure, CVA, venous insufficiency, thrombophlebitis, asthma, shortness of breath, COPD, emphysema, sleep apnea, diabetes, leg and foot swelling, osteoarthritis, rheumatoid arthritis, hiatal hernia, peptic ulcer disease, gallstones, infected gallbladder, pancreatitis, fatty liver, hepatitis, hemorrhoids, rectal bleeding, polyps, incontinence of stool, urinary stress incontinence, or cancer. Denies cellulitis, pseudotumor cerebri, meningitis, or encephalitis.,PHYSICAL EXAMINATION:, He is alert and oriented x 3. Cranial nerves II-XII are intact. Afebrile. Vital Signs are stable.
#> 3 HISTORY OF PRESENT ILLNESS: , I have seen ABC today. He is a very pleasant gentleman who is 42 years old, 344 pounds. He is 5'9". He has a BMI of 51. He has been overweight for ten years since the age of 33, at his highest he was 358 pounds, at his lowest 260. He is pursuing surgical attempts of weight loss to feel good, get healthy, and begin to exercise again. He wants to be able to exercise and play volleyball. Physically, he is sluggish. He gets tired quickly. He does not go out often. When he loses weight he always regains it and he gains back more than he lost. His biggest weight loss is 25 pounds and it was three months before he gained it back. He did six months of not drinking alcohol and not taking in many calories. He has been on multiple commercial weight loss programs including Slim Fast for one month one year ago and Atkin's Diet for one month two years ago.,PAST MEDICAL HISTORY: , He has difficulty climbing stairs, difficulty with airline seats, tying shoes, used to public seating, difficulty walking, high cholesterol, and high blood pressure. He has asthma and difficulty walking two blocks or going eight to ten steps. He has sleep apnea and snoring. He is a diabetic, on medication. He has joint pain, knee pain, back pain, foot and ankle pain, leg and foot swelling. He has hemorrhoids.,PAST SURGICAL HISTORY: , Includes orthopedic or knee surgery.,SOCIAL HISTORY: , He is currently single. He drinks alcohol ten to twelve drinks a week, but does not drink five days a week and then will binge drink. He smokes one and a half pack a day for 15 years, but he has recently stopped smoking for the past two weeks.,FAMILY HISTORY: , Obesity, heart disease, and diabetes. Family history is negative for hypertension and stroke.,CURRENT MEDICATIONS:, Include Diovan, Crestor, and Tricor.,MISCELLANEOUS/EATING HISTORY: ,He says a couple of friends of his have had heart attacks and have had died. He used to drink everyday, but stopped two years ago. He now only drinks on weekends. He is on his second week of Chantix, which is a medication to come off smoking completely. Eating, he eats bad food. He is single. He eats things like bacon, eggs, and cheese, cheeseburgers, fast food, eats four times a day, seven in the morning, at noon, 9 p.m., and 2 a.m. He currently weighs 344 pounds and 5'9". His ideal body weight is 160 pounds. He is 184 pounds overweight. If he lost 70% of his excess body weight that would be 129 pounds and that would get him down to 215.,REVIEW OF SYSTEMS: , Negative for head, neck, heart, lungs, GI, GU, orthopedic, or skin. He also is positive for gout. He denies chest pain, heart attack, coronary artery disease, congestive heart failure, arrhythmia, atrial fibrillation, pacemaker, pulmonary embolism, or CVA. He denies venous insufficiency or thrombophlebitis. Denies shortness of breath, COPD, or emphysema. Denies thyroid problems, hip pain, osteoarthritis, rheumatoid arthritis, GERD, hiatal hernia, peptic ulcer disease, gallstones, infected gallbladder, pancreatitis, fatty liver, hepatitis, rectal bleeding, polyps, incontinence of stool, urinary stress incontinence, or cancer. He denies cellulitis, pseudotumor cerebri, meningitis, or encephalitis.,PHYSICAL EXAMINATION: ,He is alert and oriented x 3. Cranial nerves II-XII are intact. Neck is soft and supple. Lungs: He has positive wheezing bilaterally. Heart is regular rhythm and rate. His abdomen is soft. Extremities: He has 1+ pitting edema.,IMPRESSION/PLAN:, I have explained to him the risks and potential complications of laparoscopic gastric bypass in detail and these include bleeding, infection, deep venous thrombosis, pulmonary embolism, leakage from the gastrojejuno-anastomosis, jejunojejuno-anastomosis, and possible bowel obstruction among other potential complications. He understands. He wants to proceed with workup and evaluation for laparoscopic Roux-en-Y gastric bypass. He will need to get a letter of approval from Dr. XYZ. He will need to see a nutritionist and mental health worker. He will need an upper endoscopy by either Dr. XYZ. He will need to go to Dr. XYZ as he previously had a sleep study. We will need another sleep study. He will need H. pylori testing, thyroid function tests, LFTs, glycosylated hemoglobin, and fasting blood sugar. After this is performed, we will submit him for insurance approval.
#> 4 2-D M-MODE: , ,1. Left atrial enlargement with left atrial diameter of 4.7 cm.,2. Normal size right and left ventricle.,3. Normal LV systolic function with left ventricular ejection fraction of 51%.,4. Normal LV diastolic function.,5. No pericardial effusion.,6. Normal morphology of aortic valve, mitral valve, tricuspid valve, and pulmonary valve.,7. PA systolic pressure is 36 mmHg.,DOPPLER: , ,1. Mild mitral and tricuspid regurgitation.,2. Trace aortic and pulmonary regurgitation.
#> 5 1. The left ventricular cavity size and wall thickness appear normal. The wall motion and left ventricular systolic function appears hyperdynamic with estimated ejection fraction of 70% to 75%. There is near-cavity obliteration seen. There also appears to be increased left ventricular outflow tract gradient at the mid cavity level consistent with hyperdynamic left ventricular systolic function. There is abnormal left ventricular relaxation pattern seen as well as elevated left atrial pressures seen by Doppler examination.,2. The left atrium appears mildly dilated.,3. The right atrium and right ventricle appear normal.,4. The aortic root appears normal.,5. The aortic valve appears calcified with mild aortic valve stenosis, calculated aortic valve area is 1.3 cm square with a maximum instantaneous gradient of 34 and a mean gradient of 19 mm.,6. There is mitral annular calcification extending to leaflets and supportive structures with thickening of mitral valve leaflets with mild mitral regurgitation.,7. The tricuspid valve appears normal with trace tricuspid regurgitation with moderate pulmonary artery hypertension. Estimated pulmonary artery systolic pressure is 49 mmHg. Estimated right atrial pressure of 10 mmHg.,8. The pulmonary valve appears normal with trace pulmonary insufficiency.,9. There is no pericardial effusion or intracardiac mass seen.,10. There is a color Doppler suggestive of a patent foramen ovale with lipomatous hypertrophy of the interatrial septum.,11. The study was somewhat technically limited and hence subtle abnormalities could be missed from the study.,
#> keywords
#> 1 allergy / immunology, allergic rhinitis, allergies, asthma, nasal sprays, rhinitis, nasal, erythematous, allegra, sprays, allergic,
#> 2 bariatrics, laparoscopic gastric bypass, weight loss programs, gastric bypass, atkin's diet, weight watcher's, body weight, laparoscopic gastric, weight loss, pounds, months, weight, laparoscopic, band, loss, diets, overweight, lost
#> 3 bariatrics, laparoscopic gastric bypass, heart attacks, body weight, pulmonary embolism, potential complications, sleep study, weight loss, gastric bypass, anastomosis, loss, sleep, laparoscopic, gastric, bypass, heart, pounds, weight,
#> 4 cardiovascular / pulmonary, 2-d m-mode, doppler, aortic valve, atrial enlargement, diastolic function, ejection fraction, mitral, mitral valve, pericardial effusion, pulmonary valve, regurgitation, systolic function, tricuspid, tricuspid valve, normal lv
#> 5 cardiovascular / pulmonary, 2-d, doppler, echocardiogram, annular, aortic root, aortic valve, atrial, atrium, calcification, cavity, ejection fraction, mitral, obliteration, outflow, regurgitation, relaxation pattern, stenosis, systolic function, tricuspid, valve, ventricular, ventricular cavity, wall motion, pulmonary artery
clinspacy_output_file =
mtsamples[1:5, 1:2] %>%
clinspacy(df_col = 'description',
verbose = FALSE,
output_file = file.path(rappdirs::user_data_dir('clinspacy'),
'output.csv'),
overwrite = TRUE)
clinspacy_output_file
#> [1] "C:\\Users\\kdpsingh\\AppData\\Local\\clinspacy\\clinspacy/output.csv"Binding named entities to a data frame (without the UMLS linker)
Negated concepts, as identified by the medspacy cycontext flag, are ignored by default and do not count towards the frequencies. However, you can now change the subsetting criteria.
Note that you now need to re-provide the original dataset to the
bind_clinspacy() function.
mtsamples[1:5, 1:2] %>%
clinspacy(df_col = 'description', verbose = FALSE) %>%
bind_clinspacy(mtsamples[1:5, 1:2])
#> clinspacy_id note_id description 2-d 2-d m-mode allergy complaint consult
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 1 1 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 0 0
#> 5 5 5 2-D Echocardiogram 1 0 0 0 0
#> doppler echocardiogram laparoscopic gastric bypass white female
#> 1 0 0 0 1
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 1 0 0 0
#> 5 0 1 0 0We can also store the intermediate result so that bind_clinspacy() does not need to re-process the text.
clinspacy_output_data =
mtsamples[1:5, 1:2] %>%
clinspacy(df_col = 'description', verbose = FALSE)
clinspacy_output_data %>%
bind_clinspacy(mtsamples[1:5, 1:2])
#> clinspacy_id note_id description 2-d 2-d m-mode allergy complaint consult
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 1 1 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 0 0
#> 5 5 5 2-D Echocardiogram 1 0 0 0 0
#> doppler echocardiogram laparoscopic gastric bypass white female
#> 1 0 0 0 1
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 1 0 0 0
#> 5 0 1 0 0
clinspacy_output_data %>%
bind_clinspacy(mtsamples[1:5, 1:2],
cs_col = 'entity')
#> clinspacy_id note_id description 2-D 2-D M-Mode Consult Doppler
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 0 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 1 0
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 1 0
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 1
#> 5 5 5 2-D Echocardiogram 1 0 0 0
#> Echocardiogram allergies complaint laparoscopic gastric bypass white female
#> 1 0 1 1 0 1
#> 2 0 0 0 1 0
#> 3 0 0 0 1 0
#> 4 0 0 0 0 0
#> 5 1 0 0 0 0
clinspacy_output_data %>%
bind_clinspacy(mtsamples[1:5, 1:2],
subset = 'is_uncertain == FALSE & is_negated == FALSE')
#> clinspacy_id note_id description 2-d 2-d m-mode allergy complaint consult
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 1 1 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 0 0
#> 5 5 5 2-D Echocardiogram 1 0 0 0 0
#> doppler echocardiogram laparoscopic gastric bypass white female
#> 1 0 0 0 1
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 1 0 0 0
#> 5 0 1 0 0We can also re-use the output file we had created earlier and pipe this directly into bind_clinspacy().
clinspacy_output_file
#> [1] "C:\\Users\\kdpsingh\\AppData\\Local\\clinspacy\\clinspacy/output.csv"
clinspacy_output_file %>%
bind_clinspacy(mtsamples[1:5, 1:2])
#> clinspacy_id note_id description 2-d 2-d m-mode allergy complaint consult
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 1 1 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 0 0
#> 5 5 5 2-D Echocardiogram 1 0 0 0 0
#> doppler echocardiogram laparoscopic gastric bypass white female
#> 1 0 0 0 1
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 1 0 0 0
#> 5 0 1 0 0
clinspacy_output_file %>%
bind_clinspacy(mtsamples[1:5, 1:2],
cs_col = 'entity')
#> clinspacy_id note_id description 2-D 2-D M-Mode Consult Doppler
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 0 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 1 0
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 1 0
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 1
#> 5 5 5 2-D Echocardiogram 1 0 0 0
#> Echocardiogram allergies complaint laparoscopic gastric bypass white female
#> 1 0 1 1 0 1
#> 2 0 0 0 1 0
#> 3 0 0 0 1 0
#> 4 0 0 0 0 0
#> 5 1 0 0 0 0
clinspacy_output_file %>%
bind_clinspacy(mtsamples[1:5, 1:2],
subset = 'is_uncertain == FALSE & is_negated == FALSE')
#> clinspacy_id note_id description 2-d 2-d m-mode allergy complaint consult
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. 0 0 1 1 0
#> 2 2 2 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 3 3 3 Consult for laparoscopic gastric bypass. 0 0 0 0 1
#> 4 4 4 2-D M-Mode. Doppler. 0 1 0 0 0
#> 5 5 5 2-D Echocardiogram 1 0 0 0 0
#> doppler echocardiogram laparoscopic gastric bypass white female
#> 1 0 0 0 1
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 1 0 0 0
#> 5 0 1 0 0Binding entity embeddings to a data frame (without the UMLS linker)
With the UMLS linker disabled, 200-dimensional entity embeddings can be
extracted from the scispacy Python package. For this to work, you must
set return_scispacy_embeddings to TRUE when running clinspacy().
It’s also a good idea to write the output directly to file because the
embeddings can be quite large.
clinspacy_output_file =
mtsamples[1:5, 1:2] %>%
clinspacy(df_col = 'description',
return_scispacy_embeddings = TRUE,
verbose = FALSE,
output_file = file.path(rappdirs::user_data_dir('clinspacy'),
'output.csv'),
overwrite = TRUE)
clinspacy_output_file %>%
bind_clinspacy_embeddings(mtsamples[1:5, 1:2])
#> clinspacy_id note_id description emb_001 emb_002 emb_003
#> 1 1 1 A 23-year-old white female presents with complaint of allergies. -0.1959790 0.28813400 0.09685702
#> 2 2 2 Consult for laparoscopic gastric bypass. -0.1115363 0.01725144 -0.13519235
#> 3 3 3 Consult for laparoscopic gastric bypass. -0.1115363 0.01725144 -0.13519235
#> 4 4 4 2-D M-Mode. Doppler. -0.3077586 0.25928350 -0.37220851
#> emb_004 emb_005 emb_006 emb_007 emb_008 emb_009 emb_010 emb_011 emb_012 emb_013 emb_014
#> 1 -0.20641684 -0.1554238 -0.01624470 0.027011001 0.05331314 -0.1006668 0.3682853 0.0581439 -0.29079599 0.1611375 -0.1118952
#> 2 -0.05496463 0.1488807 -0.19577999 0.052658666 -0.10433200 -0.0763495 0.1199215 -0.1860092 0.05465447 0.1267057 -0.2041533
#> 3 -0.05496463 0.1488807 -0.19577999 0.052658666 -0.10433200 -0.0763495 0.1199215 -0.1860092 0.05465447 0.1267057 -0.2041533
#> 4 -0.06021732 0.0386426 -0.07756314 -0.002676249 0.22511028 0.3279995 -0.2274373 -0.1656060 -0.30020200 0.5237787 -0.1472114
#> emb_015 emb_016 emb_017 emb_018 emb_019 emb_020 emb_021 emb_022 emb_023 emb_024 emb_025
#> 1 -0.03922822 0.06888010 -0.1862742 -0.1454458 0.04115367 0.049065500 0.39795328 0.05879098 0.05246135 -0.19981400 -0.03346085
#> 2 0.01984984 -0.01107489 0.1080266 0.1128684 0.23062316 -0.005933613 0.06126638 0.05048515 0.12351524 -0.02489970 -0.26744565
#> 3 0.01984984 -0.01107489 0.1080266 0.1128684 0.23062316 -0.005933613 0.06126638 0.05048515 0.12351524 -0.02489970 -0.26744565
#> 4 -0.02312062 -0.11272645 -0.3415540 -0.2255931 0.02385290 0.074861225 0.12910485 0.02176433 -0.21616454 0.08218845 0.33230226
#> emb_026 emb_027 emb_028 emb_029 emb_030 emb_031 emb_032 emb_033 emb_034 emb_035
#> 1 0.1395520 0.01792375 -0.06969561 -0.04942485 0.06613978 0.08035761 -0.12418544 -0.11839510 0.04266573 -0.04319873
#> 2 0.3418240 -0.12783451 0.38420413 -0.20168215 -0.06550949 0.26997083 -0.07201438 0.13039007 -0.13608095 0.10342984
#> 3 0.3418240 -0.12783451 0.38420413 -0.20168215 -0.06550949 0.26997083 -0.07201438 0.13039007 -0.13608095 0.10342984
#> 4 0.2420833 0.08455360 0.22111987 -0.57962301 0.32054099 -0.26178523 -0.46501200 0.05091595 -0.22430425 -0.07319695
#> emb_036 emb_037 emb_038 emb_039 emb_040 emb_041 emb_042 emb_043 emb_044 emb_045
#> 1 0.06394462 0.02425202 -0.2158322 -0.1064802 0.005398401 0.01459978 -0.03936125 -0.216860471 0.01146569 -0.01707370
#> 2 0.03349850 -0.06359592 -0.2497478 -0.1312915 -0.068015995 0.12897950 0.20849532 -0.001854315 0.02034700 0.04105476
#> 3 0.03349850 -0.06359592 -0.2497478 -0.1312915 -0.068015995 0.12897950 0.20849532 -0.001854315 0.02034700 0.04105476
#> 4 -0.19518739 -0.21279503 -0.1980325 -0.3900315 0.214830723 -0.03985715 0.32672650 -0.067201529 0.43131340 -0.10445137
#> emb_046 emb_047 emb_048 emb_049 emb_050 emb_051 emb_052 emb_053 emb_054 emb_055
#> 1 -0.08789315 -0.48977432 0.11840488 -0.24063642 -0.23959090 0.1258371 -0.0001312072 -0.15632193 0.2063196 -0.02019964
#> 2 -0.26218344 0.05762917 -0.08367021 -0.01368977 0.02369371 0.1266086 -0.1197809521 0.04324770 -0.2046735 -0.21317951
#> 3 -0.26218344 0.05762917 -0.08367021 -0.01368977 0.02369371 0.1266086 -0.1197809521 0.04324770 -0.2046735 -0.21317951
#> 4 -0.36873272 0.39958726 0.03923560 0.06519943 -0.12042060 0.1947917 0.5587487221 0.02909975 -0.1112386 -0.29085600
#> emb_056 emb_057 emb_058 emb_059 emb_060 emb_061 emb_062 emb_063 emb_064 emb_065
#> 1 -0.002069766 -0.14390510 -0.112056380 -0.12671516 -0.3076788 0.01722672 -0.04037631 0.14633203 0.072336150 0.04734538
#> 2 0.029707700 -0.04107177 -0.003977332 0.03327019 0.1377243 0.18907296 -0.26335296 0.01884718 -0.009265006 -0.16859459
#> 3 0.029707700 -0.04107177 -0.003977332 0.03327019 0.1377243 0.18907296 -0.26335296 0.01884718 -0.009265006 -0.16859459
#> 4 0.051582206 0.03322158 -0.090760550 -0.01738100 0.4675597 -0.29520441 0.62886798 -0.14435785 0.002738898 -0.03027805
#> emb_066 emb_067 emb_068 emb_069 emb_070 emb_071 emb_072 emb_073 emb_074 emb_075 emb_076
#> 1 0.2444712 0.005439494 0.07232769 0.1972760 0.007281476 -0.03698583 -0.07433472 -0.0170116 0.15559705 -0.0142159 0.03095377
#> 2 -0.2767420 0.048937336 -0.35522249 0.1164578 0.345116988 -0.03482347 -0.09575927 -0.1530600 -0.08885341 0.1138750 0.24408367
#> 3 -0.2767420 0.048937336 -0.35522249 0.1164578 0.345116988 -0.03482347 -0.09575927 -0.1530600 -0.08885341 0.1138750 0.24408367
#> 4 -0.4466182 0.080596073 0.29857932 0.2307856 0.032678135 -0.02464749 -0.05315572 0.2278580 0.05121428 0.3368990 0.12042545
#> emb_077 emb_078 emb_079 emb_080 emb_081 emb_082 emb_083 emb_084 emb_085 emb_086 emb_087
#> 1 0.14973202 -0.07275485 -0.1265165 0.0756736 -0.1064746 -0.04138183 0.1262948 -0.07008250 -0.0581785 -0.08323197 -0.1252120
#> 2 0.01405296 -0.00684475 -0.1356777 -0.1306460 0.2395754 -0.24276201 0.1975068 -0.03769429 -0.2019527 0.09356334 -0.2311737
#> 3 0.01405296 -0.00684475 -0.1356777 -0.1306460 0.2395754 -0.24276201 0.1975068 -0.03769429 -0.2019527 0.09356334 -0.2311737
#> 4 0.05976460 0.20906300 -0.3898960 -0.2403080 -0.2094990 -0.43718034 -0.2580445 -0.36398449 -0.1863167 -0.38763523 0.1124806
#> emb_088 emb_089 emb_090 emb_091 emb_092 emb_093 emb_094 emb_095 emb_096 emb_097
#> 1 0.10060352 -0.01839051 -0.24945817 0.2108233 0.2314818 -0.07174893 0.03378552 0.002213914 0.22163883 0.30331765
#> 2 0.01929579 -0.18456985 0.16967812 -0.3636869 -0.1134262 0.07241845 0.29899751 0.111884147 -0.04911397 0.05792167
#> 3 0.01929579 -0.18456985 0.16967812 -0.3636869 -0.1134262 0.07241845 0.29899751 0.111884147 -0.04911397 0.05792167
#> 4 -0.25680842 -0.21670937 -0.02249805 0.2278338 -0.1409704 0.17529125 -0.05521812 -0.186143875 0.54336450 0.13775243
#> emb_098 emb_099 emb_100 emb_101 emb_102 emb_103 emb_104 emb_105 emb_106 emb_107
#> 1 0.009472401 -0.14205784 0.12607630 -0.19062089 -0.08417289 -0.0868922 0.08520973 0.1095840322 0.0911104 -0.11639215
#> 2 -0.125230156 -0.27682150 -0.03230023 0.09556636 -0.01811487 0.2020687 -0.28405397 -0.2379808277 0.0503400 0.07255385
#> 3 -0.125230156 -0.27682150 -0.03230023 0.09556636 -0.01811487 0.2020687 -0.28405397 -0.2379808277 0.0503400 0.07255385
#> 4 -0.269951746 0.01101355 0.12618919 0.24217032 0.19674813 0.1094553 -0.02718710 -0.0006717525 0.1023474 0.30398776
#> emb_108 emb_109 emb_110 emb_111 emb_112 emb_113 emb_114 emb_115 emb_116 emb_117
#> 1 -0.1988509 -0.02318672 -0.03355397 0.06281934 0.09064088 -0.1812218 -0.08294683 0.09746995 0.16949679 0.001256246
#> 2 -0.3391048 0.29906577 -0.28191616 0.04745353 -0.04532966 -0.1529041 0.04579017 0.02364063 -0.31116034 0.160783665
#> 3 -0.3391048 0.29906577 -0.28191616 0.04745353 -0.04532966 -0.1529041 0.04579017 0.02364063 -0.31116034 0.160783665
#> 4 0.0299391 0.38101604 -0.07525725 -0.19109026 -0.09757482 -0.3430861 0.07392349 -0.34514988 -0.05409198 0.021575954
#> emb_118 emb_119 emb_120 emb_121 emb_122 emb_123 emb_124 emb_125 emb_126 emb_127 emb_128
#> 1 -0.09206300 -0.27094193 0.1914412 0.10522338 0.01736773 -0.1658078 -0.24409867 -0.20621473 -0.35578349 0.19991713 -0.1075110
#> 2 -0.07702465 -0.02175729 -0.1156647 0.01362599 -0.20085029 0.3362202 -0.03874875 -0.02545092 0.21585878 -0.04820869 0.1341518
#> 3 -0.07702465 -0.02175729 -0.1156647 0.01362599 -0.20085029 0.3362202 -0.03874875 -0.02545092 0.21585878 -0.04820869 0.1341518
#> 4 0.24660901 -0.25714830 -0.3096262 0.14711675 -0.09584628 -0.2465328 0.02228437 -0.05287175 0.04758008 0.13082074 -0.4366458
#> emb_129 emb_130 emb_131 emb_132 emb_133 emb_134 emb_135 emb_136 emb_137 emb_138 emb_139
#> 1 0.050961102 0.08590268 -0.07344585 -0.11005830 0.2082962 -0.03440777 -0.15951183 0.04417117 -0.1002716 -0.07090355 -0.09013366
#> 2 0.084913827 0.21485816 -0.26201880 -0.04661880 0.1594945 0.24577541 -0.04687785 0.02120483 -0.2707188 -0.05038439 -0.21531074
#> 3 0.084913827 0.21485816 -0.26201880 -0.04661880 0.1594945 0.24577541 -0.04687785 0.02120483 -0.2707188 -0.05038439 -0.21531074
#> 4 0.002557264 0.30628723 -0.24981013 -0.01674807 -0.3169997 0.12056302 -0.09506032 -0.01222125 -0.4409042 0.23120450 0.01691840
#> emb_140 emb_141 emb_142 emb_143 emb_144 emb_145 emb_146 emb_147 emb_148 emb_149 emb_150
#> 1 0.004567102 -0.04074124 -0.09970398 -0.07412403 0.08118367 0.04151318 0.01023637 -0.02712608 0.1120797 0.07420963 0.2022959
#> 2 -0.214246295 0.12730155 0.04358483 -0.04084410 0.08556246 0.37193301 -0.23297635 0.16786779 -0.1552295 0.13361997 0.4047717
#> 3 -0.214246295 0.12730155 0.04358483 -0.04084410 0.08556246 0.37193301 -0.23297635 0.16786779 -0.1552295 0.13361997 0.4047717
#> 4 0.127434801 0.19368662 0.02984041 -0.14155845 -0.15326020 0.02936405 0.05187999 0.06006772 0.0758267 0.04905358 -0.0133047
#> emb_151 emb_152 emb_153 emb_154 emb_155 emb_156 emb_157 emb_158 emb_159 emb_160 emb_161
#> 1 -0.02539130 -0.1542052 0.09878749 0.11210436 0.190853971 -0.2355878 0.1032905 -0.21532827 0.09456767 -0.1445503 -0.33522494
#> 2 -0.07385027 0.2168649 0.08279617 0.02853568 0.007983398 -0.2673024 -0.3518553 0.07097678 0.08358909 -0.1986835 -0.29901644
#> 3 -0.07385027 0.2168649 0.08279617 0.02853568 0.007983398 -0.2673024 -0.3518553 0.07097678 0.08358909 -0.1986835 -0.29901644
#> 4 0.25728051 0.2761333 -0.10433040 -0.02122432 0.066375951 -0.3625118 -0.2547615 0.13501658 -0.28645951 -0.1917117 -0.01892012
#> emb_162 emb_163 emb_164 emb_165 emb_166 emb_167 emb_168 emb_169 emb_170 emb_171 emb_172
#> 1 0.15268593 -0.001686232 0.2152747 -0.10312133 0.1135696 -0.02624894 0.1098730 0.09047928 0.12684340 -0.0694985 -0.11949543
#> 2 -0.01896982 -0.052200415 0.1262764 0.10607937 0.0321700 -0.25643115 -0.1073976 0.26462262 0.03679075 -0.2173935 0.07656907
#> 3 -0.01896982 -0.052200415 0.1262764 0.10607937 0.0321700 -0.25643115 -0.1073976 0.26462262 0.03679075 -0.2173935 0.07656907
#> 4 -0.02507000 -0.031375002 -0.2519416 0.08888888 0.3796148 -0.25476800 -0.1437821 -0.15589955 0.23368900 0.1311810 0.52442150
#> emb_173 emb_174 emb_175 emb_176 emb_177 emb_178 emb_179 emb_180 emb_181 emb_182 emb_183
#> 1 0.2164041 -0.29396720 -0.16588253 -0.1348005 -0.1148055 -0.08968537 0.05097483 0.09355133 0.008875800 0.1106400 -0.1088511
#> 2 -0.1012526 -0.02410151 -0.02048860 -0.1179298 0.2362113 0.30876314 -0.22625668 0.07487945 0.008851715 -0.1024263 -0.2249113
#> 3 -0.1012526 -0.02410151 -0.02048860 -0.1179298 0.2362113 0.30876314 -0.22625668 0.07487945 0.008851715 -0.1024263 -0.2249113
#> 4 -0.0487657 0.25153150 0.02299049 -0.1953604 -0.1572996 0.29195935 -0.05653973 -0.12341889 -0.312314242 -0.1885454 -0.2873893
#> emb_184 emb_185 emb_186 emb_187 emb_188 emb_189 emb_190 emb_191 emb_192 emb_193
#> 1 -0.02326688 0.17733055 -0.07351807 0.0222525 -0.12066887 -0.179350998 0.01909462 0.13228424 0.024832169 0.05002003
#> 2 -0.06455390 0.07631866 0.01623236 -0.1098196 -0.04689731 -0.033685058 0.16270872 -0.05825762 0.069446986 -0.05563271
#> 3 -0.06455390 0.07631866 0.01623236 -0.1098196 -0.04689731 -0.033685058 0.16270872 -0.05825762 0.069446986 -0.05563271
#> 4 -0.02149600 -0.16462975 0.14877875 0.2350687 0.36260483 0.004200405 0.20571376 0.09558415 -0.006550124 -0.30820300
#> emb_194 emb_195 emb_196 emb_197 emb_198 emb_199 emb_200
#> 1 -0.20531311 -0.00853500 0.0639337 0.29886368 0.01618892 -0.08192083 -0.37027851
#> 2 -0.17479033 -0.13635058 0.1291080 -0.09743453 -0.09941812 -0.05773153 -0.09702638
#> 3 -0.17479033 -0.13635058 0.1291080 -0.09743453 -0.09941812 -0.05773153 -0.09702638
#> 4 0.01686265 -0.05414012 -0.1694009 -0.13313706 -0.15822850 0.14830773 -0.34555282
#> [ reached 'max' / getOption("max.print") -- omitted 1 rows ]Adding the UMLS linker
The UMLS linker can be turned on (and off) even if clinspacy_init()
has already been called. The first time you turn it on, it takes a while
because the linker needs to be loaded into memory. On subsequent removal
and addition, this occurs much more quickly because the linker is only
removed/added to the pipeline and does not need to be reloaded into
memory.
clinspacy_init(use_linker = TRUE)Named entity recognition (with the UMLS linker)
By turning on the UMLS linker, you can restrict the results by semantic
type. In general, restricting the result in clinspacy() is not a good
idea because you can always subset the results later within
bind_clinspacy() and bind_clinspacy_embeddings().
clinspacy('This patient has diabetes and CKD stage 3 but no HTN.')
#> | | | 0% | |=========================================================================================================================| 100%
#> clinspacy_id cui entity lemma semantic_type definition is_family is_historical
#> 1 1 C0030705 patient patient Patient or Disabled Group Patients FALSE FALSE
#> 2 1 C1578481 patient patient Idea or Concept Mail Claim Party - Patient FALSE FALSE
#> 3 1 C1578484 patient patient Idea or Concept Relationship modifier - Patient FALSE FALSE
#> 4 1 C1578485 patient patient Intellectual Product Specimen Source Codes - Patient FALSE FALSE
#> 5 1 C1578486 patient patient Intellectual Product Disabled Person Code - Patient FALSE FALSE
#> 6 1 C0011847 diabetes diabetes Disease or Syndrome Diabetes FALSE FALSE
#> 7 1 C0011849 diabetes diabetes Disease or Syndrome Diabetes Mellitus FALSE FALSE
#> 8 1 C2316787 CKD stage 3 ckd stage 3 Disease or Syndrome Chronic kidney disease stage 3 FALSE FALSE
#> 9 1 C0020538 HTN htn Disease or Syndrome Hypertensive disease FALSE FALSE
#> is_hypothetical is_negated is_uncertain section_category
#> 1 FALSE FALSE FALSE <NA>
#> 2 FALSE FALSE FALSE <NA>
#> 3 FALSE FALSE FALSE <NA>
#> 4 FALSE FALSE FALSE <NA>
#> 5 FALSE FALSE FALSE <NA>
#> 6 FALSE FALSE FALSE <NA>
#> 7 FALSE FALSE FALSE <NA>
#> 8 FALSE FALSE FALSE <NA>
#> 9 FALSE TRUE FALSE <NA>
clinspacy('This patient with diabetes is taking omeprazole, aspirin, and lisinopril 10 mg but is not taking albuterol anymore as his asthma has resolved.',
semantic_types = 'Pharmacologic Substance')
#> | | | 0% | |=========================================================================================================================| 100%
#> clinspacy_id cui entity lemma semantic_type definition is_family is_historical is_hypothetical
#> 1 1 C0028978 omeprazole omeprazole Pharmacologic Substance Omeprazole FALSE FALSE FALSE
#> 2 1 C0004057 aspirin aspirin Pharmacologic Substance Aspirin FALSE FALSE FALSE
#> 3 1 C0065374 lisinopril lisinopril Pharmacologic Substance Lisinopril FALSE FALSE FALSE
#> 4 1 C0001927 albuterol albuterol Pharmacologic Substance Albuterol FALSE FALSE FALSE
#> is_negated is_uncertain section_category
#> 1 FALSE FALSE <NA>
#> 2 FALSE FALSE <NA>
#> 3 FALSE FALSE <NA>
#> 4 TRUE FALSE <NA>
clinspacy('This patient with diabetes is taking omeprazole, aspirin, and lisinopril 10 mg but is not taking albuterol anymore as his asthma has resolved.',
semantic_types = 'Disease or Syndrome')
#> | | | 0% | |==========================================