Learn R Programming

rpubmed

Tools for extracting and processing records from Pubmed and Pubmed Central.

This project is still very much in development... Please contact me with any questions, suggestions or bug reports.

I have built in support for searching and processing MeSH headings, making this package of particular use for biomedical researchers conducting systematic reviews and meta-analyses. Two of these functions (mesh_assoc_table and keyword_assoc_table) produce association matrices which can be fed into graph packages such as igraph to visualise the associations between different search terms.

Available functions:

  • fetch - Tools for bulk downloading of Pubmed records
    • fetch_in_chunks(ids, chunk_size = 500, delay = 0, ...)
    • pubmed_fetch(ids, file_format = "xml", as_r_object = TRUE, ...)
  • textsearch - Tools for text-mining of abstracts and metadata from downloaded records
    • get_articles_by_terms(corpus, term_list, where, case_sensitive = FALSE, ...)
    • record_counts_by_year(corpus)
  • io - saving records to disk and printing summaries of abstract lists to file or sdout
    • write_JSON_file(x, file)
    • write_record_list(articles, out_file = "", abstract_p = FALSE, markdown_p = FALSE, linestart = "* ")
  • locations - Geocoding functionality added for finding the coordinates of departments affiliated with Pubmed Articles.
    • geocode_addresses(addresses, sleeper = 0.33, depth = 3)
    • get_article_location_data(abstracts)
    • geocode_address(address, depth = 3)
  • mesh - Tools for processing and exploring associations between MeSH headings and other keywords
    • mesh_assoc_table(corpus)
    • keyword_assoc_table(corpus, keyword_list, keyword_names, ...)
    • get_mesh_headings(article)
    • mesh_heading_frequency(corpus)

Copy Link

Version

Version

0.1

License

GPL-3

Maintainer

David Springate

Last Published

February 15th, 2017

Functions in rpubmed (0.1)

fetch_in_chunks

Downloads abstracts and Metadata from Pubmed, storing as R objects
geocode_addresses

Returns a data frame of geocoded addresses with longitude and latitudes Uses the Google Maps geocode API
entrez_email

Set global variables
get_article_location_data

Extracts addresses of affiliated departments from Pubmed metadata email addresses are cleaned out.
abstract_to_text

concatenates abstract list to a single string
geocode

Helper function for geocode_address
chunker

Helper function to split a vector v into list of chunks of chunk_size
get_mesh_headings

Returns a list of MeSH headings for an article
geocode_address

Function to get coordinates from a supplied address If no match is found, it recursively calls itself on the address minus the first line of the address
get_articles_by_terms

Returns a list of articles matching the termlist items in the termlist can be strings or character vectors, concatenated to an "or" regex e.g list(c("gprd", "diabetes")) returns all articles mentioning either gprd or diabetes. different items in the list recursively filter the list e.g. list("gprd", "diabetes") returns articles mentioning gprd and diabtes
mesh_assoc_table

Builds an association matrix for all MeSH terms in an article corpus
in_record_text_p

predicate function for searching in title and abstract
mesh_to_text

concatenates a list of MeSH headings to a single string
mesh_heading_frequency

Returns a data frame of all MeSH headings in a corpus, with frequencies for each
mesh_table

helper function for mesh_assoc_table
keyword_assoc_table

Builds an association table for a character vector of search terms in a corpus. This can then e.g. be fed into igraph to generate an adjacency graph of terms Different column names can be set for the association matrix if e.g. complex regex terms are used for the keyword_list
in_mesh_abstract_p

predicate function for searching abstracts and MeSH headings
in_abstract_p

predicate function for searching abstracts
pubmed_fetch

Download data from Pubmed
in_mesh_headings_p

predicate function for searching MeSH headings
read_article_json

Redundant wrapper around fromJSON
term_in_text_p

predicate function for presence of a term in an article text
record_counts_by_year

Gives a breakdown of records per year in a corpus of Pubmed Records
title_to_text

concatenates a list of MeSH headings to a single string
write_record_list

Writes article title and citation data to file or stdout. Can optionally also output abstracts and output as Markdown, with customisable line starts, e.g. for unordered lists
write_JSON_file

Write a list of (e.g.) Pubmed records (e.g.) from rpubmed_fetch_in_chunks to json file