compare_mir_terms_log2: Compare log2-frequency count of terms associated with a miRNA name

Description

Compare log2-frequency count of terms associated with a miRNA name over two topics.

Usage

compare_mir_terms_log2(
  df,
  mir,
  top = 20,
  token = "words",
  ...,
  topic = NULL,
  shared = TRUE,
  normalize = TRUE,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  col.mir = miRNA,
  col.abstract = Abstract,
  col.topic = Topic,
  col.pmid = PMID,
  title = NULL
)

Arguments

Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.

mir

String. miRNA name of interest.

top

Integer. Number of top terms to plot.

token

String. Specifies how abstracts shall be split up. Taken from unnest_tokens() in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."

...

Additional arguments for tokenization, if necessary.

topic

Character vector. Optional. Specifies which topics to plot. Must have length two. If topic = NULL, all topics in df are plotted.

shared

Boolean. If shared = TRUE, only terms that are shared between the two topics are plotted.

normalize

Boolean. If normalize = TRUE, normalizes the number of abstracts to the total number of abstracts in a topic.

stopwords

Data frame containing stop words.

stopwords_ngram

Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when token = 'ngrams'.

col.mir

Symbol. Column containing miRNA names.

col.abstract

Symbol. Column containing abstracts.

col.topic

Symbol. Column containing topic names.

col.pmid

Symbol. Column containing PubMed-IDs.

title

String. Plot title.

Value

List containing bar plot comparing the log2-frequency of terms associated with a miRNA over two topics and its corresponding data frame.

Details

Compare log2-frequency count of terms associated with a miRNA name over two topics by plotting the log2-ratio of the term count associated with a miRNA name over two topics. miRNA names and topics must be in a data frame df, while terms are taken from abstracts contained in df. Number of top terms to plot is regulated by top. Terms can either be evaluated as their raw count, e.g. in how many abstracts they are mentioned in conjunction with the miRNA name, or as their relative count, e.g. in how many abstracts containing the miRNA they are mentioned compared to all abstracts containing the miRNA. compare_mir_terms_log2() is based on the tools available in the tidytext package. The log2-plot is greatly inspired by the book <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> by Silge and Robinson.

References

Silge, Julia, and David Robinson. 2016. <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.