Compare count of top terms associated with a miRNA name over various topics.
compare_mir_terms(
df,
mir,
top = 20,
token = "words",
...,
topic = NULL,
shared = TRUE,
normalize = TRUE,
stopwords = stopwords_miretrieve,
stopwords_ngram = TRUE,
position = "dodge",
col.mir = miRNA,
col.abstract = Abstract,
col.topic = Topic,
col.pmid = PMID,
title = NULL
)
Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.
String. miRNA name of interest.
Integer. Number of top terms to plot.
String. Specifies how abstracts shall be split up. Taken from
unnest_tokens()
in the tidytext package:
"Unit for tokenizing, or a custom tokenizing function. Built-in options are
"words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams",
"sentences", "lines", "paragraphs", "regex",
(...),
and "ptb" (Penn Treebank). If a function, should take a character vector and
return a list of character vectors of the same length."
Additional arguments for tokenization, if necessary.
Character vector. Optional. Specifies topics to plot.
If topic = NULL
, all topics in df
are plotted.
Boolean. If shared = TRUE
, only terms that are shared
between all topics are plotted.
Boolean. If normalize = TRUE
, normalizes the number of
abstracts to the total number of abstracts with a miRNA name in a topic.
Data frame containing stop words.
Boolean. Specifies if stop words shall be removed
from abstracts when using ngrams. Only applied when token = 'ngrams'
.
Character vector. Vector containing either "dodge" or "facet". Determines if bar plots are on top of or next to each other.
Symbol. Column containing miRNA names.
Symbol. Column containing abstracts.
Symbol. Column containing topic names.
Symbol. Column containing PubMed-IDs.
String. Plot title.
Bar plot comparing the count of terms associated with a miRNA name over two topics.
Compare count of top terms associated with a miRNA name
over various topics.
miRNA names and topics must be in a data frame df
, while terms are taken
from abstracts contained in df
.
Number of top terms to plot is regulated by top
. Terms can either be
evaluated as their raw count, e.g. in how many abstracts they are mentioned
in conjunction with the miRNA name, or as their relative count, e.g.
in how many abstracts containing the miRNA they are mentioned compared to all
abstracts containing the miRNA.
compare_mir_terms()
is based on the tools available in the
tidytext package.
compare_mir_terms_log2()
, compare_mir_terms_scatter()
Other compare functions:
compare_mir_count_log2()
,
compare_mir_count_unique()
,
compare_mir_count()
,
compare_mir_terms_log2()
,
compare_mir_terms_scatter()
,
compare_mir_terms_unique()