plot_mir_terms: Plot count of top terms associated with a miRNA name

Description

Plot count of top terms associated with a miRNA name.

Usage

plot_mir_terms(
  df,
  mir,
  top = 20,
  tf.idf = FALSE,
  token = "words",
  ...,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  normalize = TRUE,
  colour = "steelblue3",
  col.mir = miRNA,
  col.abstract = Abstract,
  col.pmid = PMID,
  title = NULL
)

Arguments

Data frame containing miRNA names, abstracts, and PubMed-IDs.

mir

String. miRNA name of interest.

top

Integer. Number of top terms to plot.

tf.idf

Boolean. If tf.idf = TRUE, terms are weighed in a tf-idf fashion. miRNA names are considered as separate documents and terms often associated with one miRNA, but not with other miRNAs get more weight.

token

String. Specifies how abstracts shall be split up. Taken from unnest_tokens() in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."

...

Additional arguments for tokenization, if necessary.

stopwords

Data frame containing stop words.

stopwords_ngram

Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when token = 'ngrams'.

normalize

Boolean. If normalize = TRUE, normalizes the number of abstracts to the total number of abstracts with a miRNA name in a topic. Cannot be applied with tf.idf = TRUE.

colour

String. Colour of bar plot.

col.mir

Symbol. Column containing miRNA names

col.abstract

Symbol. Column containing abstracts.

col.pmid

Symbol. Column containing PubMed-IDs.

title

String. Title plot.

Value

Bar plot displaying the count of the top terms associated with a miRNA name.

Details

Plot count of top terms associated with a miRNA name. Top terms associated with mir have to be in df as abstracts. Number of top terms to plot is regulated via the top argument. Terms can either be evaluated as their count or in a tf-idf fashion. If terms are evaluated as their count, they can either be evaluated as their raw count, e.g. in how many abstracts they are mentioned in conjunction with the miRNA name, or as their relative count, e.g. in how many abstracts containing the miRNA they are mentioned compared to all abstracts containing the miRNA. If terms are evaluated in a tf-idf fashion, miRNA names are considered as separate documents and terms often associated with one miRNA, but not with other miRNAs get more weight. plot_mir_terms() is based on the tools available in the tidytext package.

Description

Usage

Arguments

Value

Details

See Also