dataset_cui2vec_embeddings dataset included with this package.The embeddings are derived from Andrew Beam's cui2vec R package.
bind_clinspacy_embeddings(
clinspacy_output,
df,
type = "scispacy",
df_id = NULL,
subset = "is_negated == FALSE"
)A data.frame or file name containing the output from
clinspacy. In order for scispacy embeddings to be available
to bind_clinspacy_embeddings, you must set
return_scispacy_embeddings to TRUE when running
clinspacy so that the embeddings are included within
clinspacy_output.
The data.frame to which you would like to bind the output of
clinspacy.
The type of embeddings to return. One of scispacy and
cui2vec. Whereas cui2vec embeddings require the UMLS linker
to be enabled, the scispacy embeddings do not. Defaults to
scispacy.
Logical criteria represented as a string by which the
clinspacy_output will be subsetted prior to building the output data
frame. Defaults to "is_negated == FALSE", which removes negated
concepts prior to generating the output. Any column in
clinspacy_output may be referenced here. To avoid any subsetting,
set this to NULL.
A data frame containing the original data frame as well as the concept embeddings. For scispacy embeddings, this returns 200 columns of embeddings. For cui2vec embeddings, this returns 500 columns of embedings. The resulting data frame can be used to train a machine learning model.
Citation
Beam, A.L., Kompa, B., Schmaltz, A., Fried, I., Griffin, W, Palmer, N.P., Shi, X., Cai, T., and Kohane, I.S.,, 2019. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. arXiv preprint arXiv:1804.01486.
License
The cui2vec data is made available under a CC BY 4.0 license. The only change made to the original dataset is the renaming of columns.
# NOT RUN {
mtsamples <- dataset_mtsamples()
mtsamples[1:5,] %>%
clinspacy(df_col = 'description', return_scispacy_embeddings = TRUE) %>%
bind_clinspacy_embeddings(mtsamples[1:5,])
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab