Pbase

Manipulating and exploring protein and proteomics data.

Installation

It is advised to install Pbase from Bioconductor:

library("BiocInstaller")
biocLite("Pbase")

Note however that you will need the devel version of Bioconductor for this. See `?useDevel' for details.

From github using devtools::install_github:

library("devtools")
install_github("ComputationalProteomicsUnit/Pbase")

Dependencies

See the DESCRIPTION file for a complete list.

Getting started

Currently, the best way to get started is ?Proteins and the Pbase-data vignette. More documentation is on its way.

Development

Pbase is under heavy development and is likely to considerably change in the near future. Suggestion and bug reports are welcome and can be filed as github issues.

If you would like to contribute, please directly send pull requests for minor contributions and typos. For major contributions, we suggest to first get in touch with the package maintainers.

Ideas

Protein coverage

proteinCoverage that return a summary (S4 class) of a protein coverage given an experimental set of observed peptides (via an MSnExp, MSnSet with an fcol or an mzId object) or a theoretical set of peptides after in silico digestion (see also next point). The return value would have its own plot method using Pviz.

setGeneric("proteinCoverage", function(x, y, ...), standardGeneric("proteinCoverage"))
## uses internal IRanges or, if absent, first cleaves the protein
setMethod("proteinCoverage", c("Proteins", "missing"), function(x, y) { ... } )
## later
## setMethod("proteinCoverage", c("Proteins", "MSnSet"), function(x, y, fcol = "pepseq") { ... } )
## setMethod("proteinCoverage", c("Proteins", "MSnExp"), function(x, y, fcol = "pepseq") { ... } )
##setMethod("proteinCoverage", c("Proteins", "mzID"), function(x, y) { ... } )

Assessing the redundancy of a protein fasta database

Given a protein fasta file, what is the maximal sensitivity that can be expected from a mass spectrometry experiment with 0, 1, ... miscleavages. This should probably also include a filtering step for peptide flyability.

Flyability/Detectability

Some literature about estimating detectability:

Liu et al. 2011:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

Logistic Regression based on Hydrophobicity, Isoelectric point, length, molecular weight, average hydrophobicity, average isoelectric point

Webb-Robertson et al. 2007:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

35 features: length, weidght, # of (non-)polar, # of (un)charged, # of pos./neg. charged residues, hydrophobicity (different models), polarity (different models), bulkiness, AA singlet counts

Sanders et al. 2007

Requirements for in-silico created peptides: length(peptides) >= 6

Features: Length, Charge, Isoelectric Point, Molecular Weight, Hydropathicity, Counts of each AA (20 Features), Percent composition of each AA (20 Features), Percent of polar, psoitive, negative, hydrophobic AA

take-home-message: a model of one species/dataset could not be transfered to another dataset (without dramatically decreasing the performance)

Mallick et al. 2007

~1000 Features.

Some of the most discriminating properties: Total/Average net/positive charge, hydrophobic moment, isoelectric point, Histidine composition

take-home-message: The model of one species is comparable to another if the evolutionary distance is small (e.g. yeast and human) but you can't compare different devices/datasets (e.g. MALDI vs ESI)

Simple Rules

Mass: 500:4500

http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S5.pdf http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif

Length: 5:40

http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S6.pdf http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif

95% of all peptides are of length 5:30: http://www.nature.com/nbt/journal/v25/n1/extref/nbt1275-S24.pdf

Average Isoelectric point: seq(0, 1.4) http://ieeexplore.ieee.org/ielx5/5779756/5779971/5780167/html/img/5780167-fig-1-large.gif

Hydropathy/Hydrophobicity

http://web.expasy.org/tools/protparam/protparam-doc.html http://web.expasy.org/compute_pi/pi_tool-doc.html Kyte, Jack, and Russell F. Doolittle. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157.1 (1982): 105-132.

Selection of optimal heavy peptides for absolute quantitation

See Pavel's idea.

Protein domains

Maybe support for the annotation of detection of protein domains.

Mapping a Protein Sequence to a Genome Sequence

See the mapping vignette.

See GenomicRanges::mapCoords for a method for translating ranges. Ideally, we want to follow that API.

Interoperability

The package allows to easily interact with AAString and AAStringSet instances, protein databases such as UniProt (and possibly biomaRt in the future) using protein identifiers, protein identification results (mzID or (devel) mzR packages) and possibly also MSnExp and MSnSet instances.

Pbase

Installation

Dependencies

Getting started

Development

Ideas

Protein coverage

Assessing the redundancy of a protein fasta database

Flyability/Detectability

Liu et al. 2011:

Webb-Robertson et al. 2007:

Sanders et al. 2007

Mallick et al. 2007

Simple Rules

Hydropathy/Hydrophobicity

Selection of optimal heavy peptides for absolute quantitation

Protein domains

Mapping a Protein Sequence to a Genome Sequence

Interoperability

Copy Link

Version

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in Pbase (0.4.0)