bind_clinspacy_embeddings: This function binds columns containing entity or concept embeddings to a data frame. The entity embeddings are derived from the scispacy package, and the concept embeddings are derived from the `dataset_cui2vec_embeddings` dataset included with this package.

Description

The embeddings are derived from Andrew Beam's cui2vec R package.

Usage

bind_clinspacy_embeddings(
  clinspacy_output,
  df,
  type = "scispacy",
  df_id = NULL,
  subset = "is_negated == FALSE"
)

Arguments

clinspacy_output

A data.frame or file name containing the output from clinspacy. In order for scispacy embeddings to be available to bind_clinspacy_embeddings, you must set return_scispacy_embeddings to TRUE when running clinspacy so that the embeddings are included within clinspacy_output.

The data.frame to which you would like to bind the output of clinspacy.

type

The type of embeddings to return. One of scispacy and cui2vec. Whereas cui2vec embeddings require the UMLS linker to be enabled, the scispacy embeddings do not. Defaults to scispacy.

df_id

The name of the id column in the data frame with which the id column in clinspacy_output will be joined. If you supplied a df_id in clinspacy, then you must also supply it here. If you did not supply it in clinspacy, then it will default to the row number (similar behavior to in clinspacy).

subset

Logical criteria represented as a string by which the clinspacy_output will be subsetted prior to building the output data frame. Defaults to "is_negated == FALSE", which removes negated concepts prior to generating the output. Any column in clinspacy_output may be referenced here. To avoid any subsetting, set this to NULL.

Value

A data frame containing the original data frame as well as the concept embeddings. For scispacy embeddings, this returns 200 columns of embeddings. For cui2vec embeddings, this returns 500 columns of embedings. The resulting data frame can be used to train a machine learning model.

Details

Citation

Beam, A.L., Kompa, B., Schmaltz, A., Fried, I., Griffin, W, Palmer, N.P., Shi, X., Cai, T., and Kohane, I.S.,, 2019. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. arXiv preprint arXiv:1804.01486.

License

The cui2vec data is made available under a CC BY 4.0 license. The only change made to the original dataset is the renaming of columns.

Examples

Run this code

# NOT RUN {
mtsamples <- dataset_mtsamples()
mtsamples[1:5,] %>%
  clinspacy(df_col = 'description', return_scispacy_embeddings = TRUE) %>%
  bind_clinspacy_embeddings(mtsamples[1:5,])
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab