Learn R Programming

cisPath (version 1.12.0)

formatSTRINGPPI: Format PPI file downloaded from the STRING database

Description

This method is used to format the PPI file which is downloaded from the STRING database.

Usage

formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700)
## S3 method for class 'character,character,character,character':
formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700)

Arguments

input
File downloaded from the STRING database (character(1)).
mappingFile
Identifier mapping file (character(1)). Generate this file with method getMappingFile.
taxonId
NCBI taxonomy specie identifier (character(1)). Process only data for this specie. Examples: 9606: Homo sapiens 4932: Saccharomyces cerevisiae 6239: Caenorhabditis elegans 7227: Drosophila melanogaster 10090: Mus musculus 10116: Rattus norvegicus
output
Output file (character(1)).
minScore
Filter out PPI information with STRING scores less than this value. (integer(1)). Recommended default 700 (Only consider high confidence interactions).

Value

  • Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins. An edge value is estimated for each link between two interacting proteins. This value is defined as max(1,log(1000-STRING_SCORE,100)). This may be treated as the ``cost'' while determining the shortest paths between proteins. Advanced users can edit the file and change this value for each edge.

Details

The input file is downloaded from the STRING database (http://string-db.org/). The URL of this file is http://string-db.org/newstring_download/protein.links.v9.1.txt.gz. Access http://string-db.org/newstring_download/species.v9.1.txt to determine the parameter taxonId. Access http://string-db.org/newstring_cgi/show_download_page.pl for more details. If you make use of this file, please cite the STRING database.

References

Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568. Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815. UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40, D71-D75.

See Also

cisPath, getMappingFile, formatPINAPPI, formatSIFfile, formatiRefIndex, combinePPI.

Examples

Run this code
library(cisPath)
    
    # Generate the identifier mapping file 
    input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath")
    mappingFile <- file.path(tempdir(), "mappingFile.txt")
    getMappingFile(input, output=mappingFile, taxonId="9606")
    
    # Format the file downloaded from STRING database
    output <- file.path(tempdir(), "STRINGPPI.txt")
    fileFromSTRING <- system.file("extdata", "protein.links.txt", package="cisPath")
    formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output, 700)
    
source("http://bioconductor.org/biocLite.R")
    biocLite("R.utils")
    library(R.utils)
    
    outputDir <- file.path(getwd(), "cisPath_test")
    dir.create(outputDir, showWarnings=FALSE, recursive=TRUE)
    
    # Generate the identifier mapping file 
    fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat")
    mappingFile <- file.path(outputDir, "mappingFile.txt")
    getMappingFile(fileFromUniProt, output=mappingFile)
    
    # Download STRING PPI for Homo sapiens (compressed:~27M, decompressed:~213M)
    destfile <- file.path(outputDir, "9606.protein.links.v9.1.txt.gz")
    cat("Downloading...\n")
    download.file("http://string-db.org/newstring_download/protein.links.v9.1/9606.protein.links.v9.1.txt.gz", destfile)
    cat("Uncompressing...\n")
    gunzip(destfile, overwrite=TRUE, remove=FALSE)
    
    # Format STRING PPI
    fileFromSTRING <- file.path(outputDir, "9606.protein.links.v9.1.txt")
    STRINGPPI <- file.path(outputDir, "STRINGPPI.txt")
    formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output=STRINGPPI, 700)

Run the code above in your browser using DataLab