Plot count of top terms associated with a miRNA name.
plot_mir_terms(
df,
mir,
top = 20,
tf.idf = FALSE,
token = "words",
...,
stopwords = stopwords_miretrieve,
stopwords_ngram = TRUE,
normalize = TRUE,
colour = "steelblue3",
col.mir = miRNA,
col.abstract = Abstract,
col.pmid = PMID,
title = NULL
)
Data frame containing miRNA names, abstracts, and PubMed-IDs.
String. miRNA name of interest.
Integer. Number of top terms to plot.
Boolean. If tf.idf = TRUE
, terms are weighed in a tf-idf
fashion. miRNA names are considered as separate documents and terms often
associated with one miRNA, but not with other miRNAs get more weight.
String. Specifies how abstracts shall be split up. Taken from
unnest_tokens()
in the tidytext package:
"Unit for tokenizing, or a custom tokenizing function. Built-in options are
"words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams",
"sentences", "lines", "paragraphs", "regex",
(...),
and "ptb" (Penn Treebank). If a function, should take a character vector and
return a list of character vectors of the same length."
Additional arguments for tokenization, if necessary.
Data frame containing stop words.
Boolean. Specifies if stop words shall be removed
from abstracts when using ngrams. Only applied when token = 'ngrams'
.
Boolean. If normalize = TRUE
, normalizes the number of
abstracts to the total number of abstracts with a miRNA name in a topic. Cannot
be applied with tf.idf = TRUE
.
String. Colour of bar plot.
Symbol. Column containing miRNA names
Symbol. Column containing abstracts.
Symbol. Column containing PubMed-IDs.
String. Title plot.
Bar plot displaying the count of the top terms associated with a miRNA name.
Plot count of top terms associated with a miRNA name.
Top terms associated with mir
have to be in df
as abstracts.
Number of top terms to plot is regulated via the top
argument.
Terms can either be evaluated as their count or in a tf-idf fashion.
If terms are evaluated as their count, they can either be
evaluated as their raw count, e.g. in how many abstracts they are mentioned
in conjunction with the miRNA name, or as their relative count, e.g.
in how many abstracts containing the miRNA they are mentioned compared to all
abstracts containing the miRNA.
If terms are evaluated in a tf-idf fashion, miRNA names are considered as
separate documents and
terms often associated with one miRNA, but not with other miRNAs get
more weight.
plot_mir_terms()
is based on the tools available in the tidytext package.
plot_wordcloud()
, tidytext::unnest_tokens()
Other miR term functions:
plot_wordcloud()