Create wordcloud of terms associated with a miRNA name.
plot_wordcloud(
df,
mir,
min.freq = 1,
max.terms = 20,
tf.idf = FALSE,
token = "words",
...,
stopwords = stopwords_miretrieve,
stopwords_ngram = TRUE,
colours = "black",
random.colour = TRUE,
ordered.colour = FALSE,
col.mir = miRNA,
col.abstract = Abstract,
col.pmid = PMID
)
Data frame containing miRNA names, abstracts, and PubMed-IDs.
String. miRNA name of interest.
Integer. Specifies least number of times a term must be associated with
mir
to be plotted.
Integer. Maximum number of terms to plot.
Boolean. If tf.idf = TRUE
, terms are weighed in a tf-idf
fashion. miRNA names are considered as separate documents, and terms often
associated with one miRNA, but not with other miRNAs get more weight.
Cannot be used if normalize = TRUE
. If tf.idf = TRUE
and normalize = TRUE
,
tf.idf = TRUE
is ignored.
String. Specifies how abstracts shall be split up. Taken from
unnest_tokens()
in the tidytext package:
"Unit for tokenizing, or a custom tokenizing function. Built-in options are
"words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams",
"sentences", "lines", "paragraphs", "regex",
(...),
and "ptb" (Penn Treebank). If a function, should take a character vector and
return a list of character vectors of the same length."
Additional arguments for tokenization, if necessary.
Data frame containing stop words.
Boolean. Specifies if stop words shall be removed
from abstracts when using ngrams. Only applied when token = 'ngrams'
.
Vector of strings. Colours for wordcloud.
Boolean. Taken from wordcloud()
in the
wordcloud package:
"Choose colours randomly from colours
. If false, the colour is chosen
based on the frequency."
Boolean. Taken from wordcloud()
in the
wordcloud package:
"If true, then colours are assigned to words in order."
Symbol. Column containing miRNA names.
Symbol. Column containing abstracts.
Symbol. Column containing PubMed-IDs.
Wordcloud of terms associated with a miRNA name.
Create wordcloud of terms associated with a miRNA name.
miRNA names must be in a data frame df
, while terms are taken
from abstracts contained in df
.
Number of terms to plot is regulated by max.terms
, while min.freq
regulates
the least number of times a term must be mentioned to be plotted.
Terms can either be evaluated as their raw count, e.g. how often they are
mentioned in conjunction with the miRNA of interest, or weighed in a tf-idf
fashion. If tf.idf = TRUE
, miRNA names are considered as separate documents,
and terms often associated with one miRNA, but not with other miRNAs get
more weight.
plot_wordcloud()
is based on the tools available in the wordcloud
package.
plot_mir_terms()
, wordcloud::wordcloud()
, tidytext::unnest_tokens()
Other miR term functions:
plot_mir_terms()