compare_mir_terms_scatter: Compare shared terms associated with a miRNA name

Description

Compare shared terms associated with a miRNA name over two topics.

Usage

compare_mir_terms_scatter(
  df,
  mir,
  top = 1000,
  token = "words",
  ...,
  topic = NULL,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  html = TRUE,
  colour.point = "red",
  colour.term = "black",
  col.mir = miRNA,
  col.abstract = Abstract,
  col.topic = Topic,
  col.pmid = PMID,
  title = NULL
)

Arguments

Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.

mir

String. miRNA name of interest.

top

Integer. Number of top terms to plot.

token

String. Specifies how abstracts shall be split up. Taken from unnest_tokens() in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."

...

Additional arguments for tokenization, if necessary.

topic

Character vector. Optional. Specifies which topics to plot. Must have length two. If topic = NULL, all topics in df are plotted.

stopwords

Data frame containing stop words.

stopwords_ngram

Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when token = 'ngrams'.

html

Boolean. Specifies if plot is returned as an HTML-widget or static.

colour.point

String. Colour of points for scatter plot.

colour.term

String. Colour of terms for scatter plot.

col.mir

Symbol. Column containing miRNAs.

col.abstract

Symbol. Column containing abstracts.

col.topic

Symbol. Column containing topics names.

col.pmid

Symbol. Column containing PubMed-IDs.

title

String. Plot title.

Value

Scatter plot comparing shared terms of a miRNA between two topics.

Details

Compare shared terms associated with a miRNA name over two topics. These terms are displayed as a scatter plot, which is either interactive as an HTML-widget, or static. This is regulated via the html argument. miRNA names and topics must be in a data frame df, while terms are taken from abstracts contained in df. Number of top terms to choose is regulated by top. Terms are evaluated as their raw count and plotted on a log10-scale. compare_mir_terms_scatter() is based on the tools available in the tidytext package. The term-plot is greatly inspired by <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> by Silge and Robinson.

References

Silge, Julia, and David Robinson. 2016. <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.