Learn R Programming

rPDBapi: A Comprehensive R Package Interface for Accessing the Protein Data Bank

Introduction

rPDBapi is an R package designed to provide seamless access to the RCSB Protein Data Bank (PDB). It simplifies the retrieval and analysis of 3D structural data of large biological molecules, essential for bioinformatics and structural biology research. This package leverages the PDB's XML-based API to facilitate custom queries, data retrieval, and advanced search capabilities within the R programming environment.

Features

  • User-Friendly Interface: Simplifies access to PDB data for the R community.
  • Custom Queries: Streamlines the process of crafting custom queries for efficient data retrieval.
  • Advanced Search Capabilities: Includes specialized search functions for PubMed IDs, organisms, experimental methods, protein structure similarities, and more.
  • Data Retrieval: Facilitates downloading of PDB files in various formats and extraction of FASTA sequences.
  • Integration with R: Provides functions for data manipulation and analysis directly within R, enhancing research workflows.

Installation

You can install the stable version of rPDBapi from CRAN:

install.packages("rPDBapi", repos = "http://cran.us.r-project.org")

To install the development version from GitHub:

devtools::install_github("selcukorkmaz/rPDBapi")

Usage

Loading the Package

library(rPDBapi)

Retrieving PDB IDs Retrieve PDB IDs related to a specific term, such as "hemoglobin":

pdbs <- query_search(search_term = "hemoglobin")
head(pdbs)

Advanced Searches Search by PubMed ID:

pdbs <- query_search(search_term = 32453425, query_type = "PubmedIdQuery")
pdbs

Search by source organism:

pdbs <- query_search(search_term = '7227', query_type = 'TreeEntityQuery')
head(pdbs)

Search by experimental method:

pdbs <- query_search(search_term = 'SOLID-STATE NMR', query_type='ExpTypeQuery')
head(pdbs)

Data Retrieval Fetch data based on user-defined IDs and properties:

properties <- list(rcsb_entry_info = c("molecular_weight"), exptl = "method", rcsb_accession_info = "deposit_date")
ids <- query_search("CRISPR")
df <- data_fetcher(id = ids, data_type = "ENTRY", properties = properties, return_as_dataframe = TRUE)
df

Describing Chemical Compounds Retrieve comprehensive descriptions of chemical compounds:

chem_desc <- describe_chemical('ATP')
chem_desc$rcsb_chem_comp_descriptor$smiles

Retrieving PDB Files Download PDB files in various formats:

pdb_file <- get_pdb_file(pdb_id = "4HHB", filetype = "cif")
head(pdb_file$atom)

Additional Functions get_info: Retrieve detailed information about a specific PDB entry. get_fasta_from_rcsb_entry: Fetch FASTA sequences for specified PDB entry IDs.

Documentation

For more detailed examples and usage, please refer to the package documentation.

Output Contracts

Core functions return typed objects to make downstream behavior explicit:

  • query_search()
    • return_type = "entry": character vector with class rPDBapi_query_ids
    • otherwise: parsed payload with class rPDBapi_query_response
  • perform_search()
    • default ID output: class rPDBapi_search_ids
    • return_with_scores = TRUE: class rPDBapi_search_scores
    • return_raw_json_dict = TRUE: class rPDBapi_search_raw_response
  • fetch_data()
    • validated payload with class rPDBapi_fetch_response
  • data_fetcher()
    • return_as_dataframe = TRUE: data frame with class rPDBapi_dataframe
    • return_as_dataframe = FALSE: class rPDBapi_fetch_response

Error signaling uses typed conditions (e.g., rPDBapi_error_malformed_response, rPDBapi_error_unsupported_mapping) for reliable programmatic handling.

Backward-Compatible Aliases

  • Search return types:
    • NONPOLYMER_ENTITY and NON_POLYMER_ENTITY map to the same API return type.
    • CHEMICAL_COMPONENT maps to MOL_DEFINITION.
  • Citation fields:
    • citation and rcsb_primary_citation are resolved compatibly in find_results() and find_papers().

Testing

By default, the test suite runs only deterministic unit tests (no network calls):

Sys.setenv(RPDBAPI_RUN_LIVE = "false")
testthat::test_dir("tests/testthat")

Live API integration tests are opt-in:

Sys.setenv(RPDBAPI_RUN_LIVE = "true", NOT_CRAN = "true")
testthat::test_dir("tests/testthat")

Authors

  • Selcuk Korkmaz - Trakya University, Department of Biostatistics
  • Bilge Eren Yamasan - Trakya University, Department of Biophysics

License

This package is licensed under the MIT License.

Copy Link

Version

Install

install.packages('rPDBapi')

Monthly Downloads

234

Version

3.0.0

License

MIT + file LICENSE

Maintainer

Selcuk Korkmaz

Last Published

March 7th, 2026

Functions in rPDBapi (3.0.0)

RangeOperator

Create a Range Search Operator
SequenceOperator

Create a Sequence Operator for Sequence-Based Searches
add_property

Add or Merge Properties for RCSB PDB Data Fetching
ScoredResult

Create a Scored Result Object for PDB Searches
as_rpdb_chemical_component

Convert Data to an rPDBapi Chemical Component Object
build_instance_id

Build an Instance Identifier
as_rpdb_assembly

Convert Data to an rPDBapi Assembly Object
describe_chemical

Describe Chemical Compound from RCSB PDB
build_entity_id

Build an Entity Identifier
data_fetcher_batch

Batch Fetch RCSB PDB Data with Optional Retry and Cache
build_entry_id

Build an Entry Identifier
find_results

Retrieve Specific Fields for Search Results from RCSB PDB
data_fetcher

Fetch RCSB PDB Data Based on Specified Criteria
clear_rpdbapi_cache

Clear rPDBapi Cache Directory
find_papers

Search for and Retrieve Paper Titles from PDB
cache_info

Inspect rPDBapi Cache Contents
infer_search_service

Infer the Appropriate Search Service for RCSB PDB Queries
as_rpdb_structure

Convert Data to an rPDBapi Structure Object
send_api_request

Send API Request to a Specified URL
generate_json_query

Generate a JSON Query for RCSB PDB Data Retrieval
parse_fasta_text_to_list

Helper Function: Parse FASTA Text to List Grouped by Header
parse_rcsb_id

Parse an RCSB Identifier
get_fasta_from_rcsb_entry

Retrieve FASTA Sequence from PDB Entry or Specific Chain
autoresolve_sequence_type

Automatically Determine the Sequence Type
search_rcsb_fields

Search Known RCSB Fields by Pattern
as_rpdb_polymer_entity

Convert Data to an rPDBapi Polymer Entity Object
infer_id_type

Infer RCSB Identifier Type
build_assembly_id

Build an Assembly Identifier
query_search

Search Query Function
extract_ligand_table

Extract Ligand Table
extract_calpha_coordinates

Extract C-alpha Coordinates
rPDBapi-package

rPDBapi: A Comprehensive Interface for Accessing the Protein Data Bank
summarize_assemblies

Summarize Assembly-Level Data
join_structure_sequence

Join Structure and Sequence Summaries
summarize_entries

Summarize Entry-Level Data
list_rcsb_fields

List Known RCSB Fields by Data Type
parse_response

Parse API Response
perform_search

Perform a Search in the RCSB PDB
get_pdb_api_url

Generate a PDB API URL
get_info

Retrieve Information for a Given PDB ID
search_graphql

Perform a GraphQL Query to RCSB PDB
fetch_data

Fetch Data from RCSB PDB Using a JSON Query
extract_taxonomy_table

Extract Taxonomy Table
return_data_as_dataframe

Convert RCSB PDB Response Data into a Dataframe
validate_properties

Validate Property Specification Against Known Field Registry
handle_api_errors

Handle API Errors
get_pdb_file

Download and Process PDB Files from the RCSB Database
walk_nested_dict

Recursively Walk Through a Nested Dictionary
ExactMatchOperator

Create an Exact Match Search Operator
InOperator

Create an Inclusion Search Operator
ContainsWordsOperator

Create a Contains Words Search Operator
QueryGroup

Create a Grouped Query Object for RCSB PDB Searches
DefaultOperator

Create a Default Search Operator
QueryNode

Create a Query Node for RCSB PDB Searches
ExistsOperator

Create an Existence Search Operator
ContainsPhraseOperator

Create a Contains Phrase Search Operator
ComparisonOperator

Create a Comparison Search Operator
ChemicalOperator

Create a Chemical Search Operator for SMILES/InChI Descriptors
StructureOperator

Create a Structure Operator for Structure-Based Searches
SeqMotifOperator

Create a Sequence Motif Operator for RCSB PDB Searches
RequestOptions

Define Request Options for RCSB PDB Search Queries
as_rpdb_entry

Convert Data to an rPDBapi Entry Object