Learn R Programming

prozor

  • Determine minimal protein set explaining peptide spectrum matches.
  • Utility functions for creating fasta amino acid databases with decoys and contaminants.
  • Peptide false discovery rate estimation for target decoy search results on psm, precursor, peptide and protein level.
  • Computing dynamic swath window sizes based on MS1 or MS2 signal distributions.

An HTML version of the package documentation can be found here:

https://protviz.github.io/prozor

How to install:

For CRAN version (not the newest). Please use the github version.

install.packages("prozor")

This is how you install the github version.

install.packages("remotes")
remotes::install_github("protviz/prozor")

for Developers

downlod git repo. Use roxygenize2 to document new functions. Than run these 2 commands to update namespace and Rd files:

library(devtools)
document()

Example for creating a fasta file with the fgcz_create_fasta.R script

Go to fgcz-r-035.uzh.ch

ls ./fasta_db/p3071_Chlorella
more ./fasta_db/p3071_Chlorella/annotation.txt
more ./fasta_db/p3071_Chlorella/uniprot-taxonomy_3071.fasta
clear
/usr/local/lib/R/site-library/prozor/script/fgcz_create_fasta.R -h

/usr/local/lib/R/site-library/prozor/script/fgcz_create_fasta.R nodecoy ./fasta_db/p3071_Chlorella -o /srv/www/htdocs/FASTA/
/usr/local/lib/R/site-library/prozor/script/fgcz_create_fasta.R ./fasta_db/p3071_Chlorella -o /srv/www/htdocs/FASTA/

cat p3071_Chlorella_d.txt
cat p3071_Chlorella_d.txt | bfabric_save_fasta.py 3071  /srv/www/htdocs/FASTA/fgcz_3071_Chlorella_d_20200604.fasta
# 3071 is the bfabric project name

Copy Link

Version

Install

install.packages('prozor')

Monthly Downloads

205

Version

0.3.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Witold Wolski

Last Published

December 7th, 2021

Functions in prozor (0.3.1)

greedyRes2Matrix

converts result of greedy function to a matrix with 3 columns - peptide - charge and protein
Cdsw-class

Compute dynamic swath windows
computeFDR

Compute FDR given a score
fdrSample

Data frame score and proteinID
annotatePeptides

Annotate peptides with protein ids
create_fgcz_fasta_db

create fasta db from one or more fasta files
greedy

given matrix (columns protein rows peptides), compute minimal protein set using greedy algorithm
computeFDRwithID

Compute FDR given a score
annotateAHO

annotate peptides using AhoCorasickTrie
loadContaminantsFasta2021

load list of contaminant sequences FGCZ 2021
createDecoyDB

Create db with decoys and contaminants
loadContaminantsFasta2019

load list of contaminant sequences FGCZ 2019
hardconstrain

tests hard constraints
makeIDUnip

make id for chain compatible with uniprot
reverseSeq

create rev sequences to fasta list
masses

MS masses A dataset containing approx 150000 MS1 precursor masses
writeFasta

write fasta lists into file
objectiveMS1Function

compute the deviation from optimum: equal number of MS1 per bin
readPeptideFasta

wrapper setting the correct parameters seqinr::read.fasta for reading peptide sequences
prozor

Minimal Protein Set Explaining Peptides
pepprot

Table containing peptide information
plotFDR

plot FDR
predictScoreFDR

Predict score given FDR
prepareMatrix

given table of peptide protein assigments generate matrix
makeID

make id for chain in format sp|P30443|1A01_HUMANs25
protpepmetashort

Small version of pepprot dataset to speed up computation
removeSignalPeptide

remove signal peptides from main chain
readjustWindows

Readjust windows so that boundaries in regions of few peaks.