Compare log2-frequency count of terms associated with a miRNA name over two topics.
compare_mir_terms_log2(
df,
mir,
top = 20,
token = "words",
...,
topic = NULL,
shared = TRUE,
normalize = TRUE,
stopwords = stopwords_miretrieve,
stopwords_ngram = TRUE,
col.mir = miRNA,
col.abstract = Abstract,
col.topic = Topic,
col.pmid = PMID,
title = NULL
)
Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.
String. miRNA name of interest.
Integer. Number of top terms to plot.
String. Specifies how abstracts shall be split up. Taken from
unnest_tokens()
in the tidytext package:
"Unit for tokenizing, or a custom tokenizing function. Built-in options are
"words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams",
"sentences", "lines", "paragraphs", "regex",
(...),
and "ptb" (Penn Treebank). If a function, should take a character vector and
return a list of character vectors of the same length."
Additional arguments for tokenization, if necessary.
Character vector. Optional. Specifies which topics to plot.
Must have length two.
If topic = NULL
, all topics in df
are plotted.
Boolean. If shared = TRUE
, only terms that are shared
between the two topics are plotted.
Boolean. If normalize = TRUE
, normalizes the number of
abstracts to the total number of abstracts in a topic.
Data frame containing stop words.
Boolean. Specifies if stop words shall be removed
from abstracts when using ngrams. Only applied when token = 'ngrams'
.
Symbol. Column containing miRNA names.
Symbol. Column containing abstracts.
Symbol. Column containing topic names.
Symbol. Column containing PubMed-IDs.
String. Plot title.
List containing bar plot comparing the log2-frequency of terms associated with a miRNA over two topics and its corresponding data frame.
Compare log2-frequency count of terms associated with a miRNA name over two topics by
plotting the log2-ratio of the term count associated with a miRNA name
over two topics.
miRNA names and topics must be in a data frame df
, while terms are taken
from abstracts contained in df
.
Number of top terms to plot is regulated by top
. Terms can either be
evaluated as their raw count, e.g. in how many abstracts they are mentioned
in conjunction with the miRNA name, or as their relative count, e.g.
in how many abstracts containing the miRNA they are mentioned compared to all
abstracts containing the miRNA.
compare_mir_terms_log2()
is based on the tools available in the
tidytext package.
The log2-plot is greatly inspired by the book
<U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> by
Silge and Robinson.
Silge, Julia, and David Robinson. 2016. <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.
compare_mir_terms()
, compare_mir_terms_scatter()
Other compare functions:
compare_mir_count_log2()
,
compare_mir_count_unique()
,
compare_mir_count()
,
compare_mir_terms_scatter()
,
compare_mir_terms_unique()
,
compare_mir_terms()