Learn R Programming

miRetrieve

miRetrieve is designed for microRNA text mining in abstracts. By extracting, counting, and analyzing miRNA names from literature, miRetrieve aims at providing biological insights from a large amount of text within a short period of time.

Getting Started

An online version with the most important functions of miRetrieve is available under https://miretrieve.shinyapps.io/miRetrieve/.

To install miRetrieve from CRAN, run

install.packages("miRetrieve")

Alternatively, you can also install miRetrieve from GitHub by running

install.packages("devtools")

devtools::install_github("JulFriedrich/miRetrieve",
        dependencies = TRUE,
        repos = "https://cran.r-project.org/")

miRetrieve is built around the idea of using field-specific PubMed abstracts from PubMed to characterize and analyze microRNAs in disease-related fields (e.g. "miRNAs in diabetes").

To get started, download a microRNA-related abstract from PubMed via Save - Format: PMID - Create file and load it into R using

df <- miRetrieve::read_pubmed("PubMed_file.txt")

and subsequently extract all microRNAs with

df <- extract_mir_df(df)

An extensive Vignette with the underlying mechanism, functions, and a complete workflow is available under

https://julfriedrich.github.io/miRetrieve/articles/miRetrieve.html

Authors

Julian Friedrich, Hans-Peter Hammes, Guido Krenning

License

miRetrieve is published under the GPL-3 license.

Publication

miRetrieve and its functions are presented in a manuscript, currently under review.

Supplementary Files referenced in the manuscript are located in a different repository, freely available under

https://github.com/JulFriedrich/miRetrieve-paper

Reference

Acknowledgments

  • join_mirtarbase is based on the latest miRTarBase version 8.0

(http://miRTarBase.cuhk.edu.cn/). If you use miRetrieve to visualize miRNA-mRNA interactions based on miRTarBase, please make sure to cite Hsi-Yuan Huang, Yang-Chi-Dung Lin, Jing Li, et al., miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D148–D154, https://doi.org/10.1093/nar/gkz896.

  • compare_mir_terms_log2(), compare_mir_count_log2(), and

compare_mir_terms_scatter() are greatly inspired by “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” by Silge and Robinson (https://www.tidytextmining.com/). In addition, "tidytext" provides a valuable resource of general text mining in R.

  • Key packages for miRetrieve are tidytext, topicmodels,

and the packages included in the tidyverse (see Vignette).

Copy Link

Version

Install

install.packages('miRetrieve')

Monthly Downloads

197

Version

1.3.4

License

GPL-3

Maintainer

Julian Friedrich

Last Published

September 18th, 2021

Functions in miRetrieve (1.3.4)

biomarker_keywords

Keywords - biomarkers.
assign_topic

Assign topics based on precalculated scores
add_col_topic

Add topic column to data frame
calculate_score_biomarker

Calculate biomarker scores for abstracts
assign_topic_lda

Assign topics based on LDA model
calculate_score_animals

Calculate animal model scores for abstracts
calculate_score_topic

Calculate scores of a self-chosen topic
calculate_score_patients

Calculate patients scores for abstracts
animal_keywords

Keywords - animals.
combine_df

Combine data frames into one data frame
compare_mir_terms_unique

Compare terms uniquely associated with a miRNA name
compare_mir_count_log2

Compare log2-frequency count of miRNA names between two topics
extract_snp

Extract SNPs from abstracts in data frame
extract_mir_string

Extract miRNA names from string
compare_mir_count

Compare count of miRNA names between different topics
count_mir

Count miRNA names in a data frame
get_pmid

Get PubMed-IDs of a data frame
get_mir

Get miRNA names from a data frame
get_shared_mir_df

Get top miRNA names in common between two topics of a data frame
get_shared_mir_vec

Get miRNA names in common between two vectors
combine_stopwords

Combine data frames containing stop words
df_crc

Dataset of PubMed data of miRNAs in Colorectal Cancer
get_distinct_mir_df

Identify top miRNA names distinct for one topic compared to another topic
combine_mir

Combine miRNA vectors into one
count_target

Count targets in data frame
df_panc

Dataset of PubMed data of miRNAs in Pancreatic Cancer
get_snp

Get SNPs from a data frame
get_distinct_mir_vec

Identify miRNA names distinct for one vector compared to another vector
plot_mir_development

Plot development of miRNA name mentioning over time
df_mirtarbase

miRTarBase version 8.0
compare_mir_terms_log2

Compare log2-frequency count of terms associated with a miRNA name
compare_mir_count_unique

Compare top count of unique miRNA names per topic
join_targets

Add miRNA targets from an xlsx-file to a data frame
indicate_mir

Indicate if a miRNA name is contained in an abstract
ngram_stopwords

Stop words for n-grams
count_mir_threshold

Count occurrence of miRNA names above threshold
compare_mir_terms

Compare count of terms associated with a miRNA name over various topics
count_snp

Count SNPs in a data frame
plot_mir_count_threshold

Plot occurrence count of miRNA names over different thresholds
plot_wordcloud

Create wordcloud of terms associated with a miRNA name
plot_mir_count

Plot count of most frequently mentioned miRNA names
plot_mir_new

Plot number of newly mentioned miRNA names/year
subset_df

Subset data frame for a term
read_pubmed

Convert PubMed-file from PubMed into a data frame
plot_lda_term

Plot terms associated with LDA-fitted topics
plot_score_patients

Plot frequency of patient scores in abstracts
patients_keywords

Keywords - patients.
read_pubmed_jats

Convert JATS-file from PubMed into a data frame
plot_mir_terms

Plot count of top terms associated with a miRNA name
plot_perplexity

Plot perplexity score of various LDA models
subset_mir

Subset data frame for specific miRNA names
df_test

Test dataset of PubMed abstracts
compare_mir_terms_scatter

Compare shared terms associated with a miRNA name
extract_mir_df

Extract miRNA names from abstracts in data frame
plot_score_topic

Plot frequency of self-chosen topic scores in abstracts
generate_stopwords

Generate data frame containing stop words
plot_score_biomarker

Plot frequency of biomarker scores in abstracts
join_mirtarbase

Add miRNA targets from miRTarBase version 8.0
plot_score_animals

Plot frequency of animal model scores in abstracts
fit_lda

Fit LDA-model
indicate_term

Indicate if a term is contained in abstracts
plot_target_count

Plot count of miRNA targets
stopwords_2gram

Stop words for text mining with common PubMed 2-grams
save_plot

Save the last generated figure
plot_target_mir_scatter

Plot targets and corresponding miRNAs as a scatter plot
subset_year

Subset data frame for abstracts published in a specific period
save_excel

Save data frame(s) as xlsx-file
subset_review

Subset data frame for abstracts of review articles
subset_snp

Subset data frame for specific SNPs
subset_research

Subset data frame for abstracts of research articles
subset_mir_threshold

Subset data frame for miRNA names exceeding a threshold
stopwords_miretrieve

Stop words for text mining with miRetrieve
stopwords_pubmed

Stop words for text mining from PubMed abstracts