Compare shared terms associated with a miRNA name over two topics.
compare_mir_terms_scatter(
df,
mir,
top = 1000,
token = "words",
...,
topic = NULL,
stopwords = stopwords_miretrieve,
stopwords_ngram = TRUE,
html = TRUE,
colour.point = "red",
colour.term = "black",
col.mir = miRNA,
col.abstract = Abstract,
col.topic = Topic,
col.pmid = PMID,
title = NULL
)
Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.
String. miRNA name of interest.
Integer. Number of top terms to plot.
String. Specifies how abstracts shall be split up. Taken from
unnest_tokens()
in the tidytext package:
"Unit for tokenizing, or a custom tokenizing function. Built-in options are
"words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams",
"sentences", "lines", "paragraphs", "regex",
(...),
and "ptb" (Penn Treebank). If a function, should take a character vector and
return a list of character vectors of the same length."
Additional arguments for tokenization, if necessary.
Character vector. Optional. Specifies which topics to plot.
Must have length two.
If topic = NULL
, all topics in df
are plotted.
Data frame containing stop words.
Boolean. Specifies if stop words shall be removed
from abstracts when using ngrams. Only applied when token = 'ngrams'
.
Boolean. Specifies if plot is returned as an HTML-widget or static.
String. Colour of points for scatter plot.
String. Colour of terms for scatter plot.
Symbol. Column containing miRNAs.
Symbol. Column containing abstracts.
Symbol. Column containing topics names.
Symbol. Column containing PubMed-IDs.
String. Plot title.
Scatter plot comparing shared terms of a miRNA between two topics.
Compare shared terms associated with a miRNA name over two topics. These terms are displayed
as a scatter plot, which is either interactive as an HTML-widget, or static. This
is regulated via the html
argument.
miRNA names and topics must be in a data frame df
, while terms are taken
from abstracts contained in df
.
Number of top terms to choose is regulated by top
. Terms are
evaluated as their raw count and plotted on a log10-scale.
compare_mir_terms_scatter()
is based on the tools available in the
tidytext package.
The term-plot is greatly inspired by
<U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> by
Silge and Robinson.
Silge, Julia, and David Robinson. 2016. <U+201C>tidytext: Text Mining and Analysis Using Tidy Data Principles in R.<U+201D> JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.
compare_mir_terms()
, compare_mir_terms_log2()
Other compare functions:
compare_mir_count_log2()
,
compare_mir_count_unique()
,
compare_mir_count()
,
compare_mir_terms_log2()
,
compare_mir_terms_unique()
,
compare_mir_terms()