getGOInfo: A function that parses through the GO database; it agreps for the term "complex" and greps suffixes "-ase" and "-some" and returns nodes whose description contains such terms.

Description

This function parses through the Cellular Component ontology for the GO nodes and searchs for the term "complex" or the suffix "-ase" (e.g. RNA Polymerase) or "-some" (e.g. ribosome) and (or) other user defined phrases in the description of these nodes.

Usage

getGOInfo(wantDefault = TRUE, toGrep = NULL,
parseType=NULL, eCode = NULL, wantAllComplexes = TRUE,
includedGOTerms=NULL, not2BeIncluded=NULL)

Arguments

wantDefault

A logical. If TRUE, the default parameters ("complex", "\Base\b" and "\Bsome\b") are used.

toGrep

A character vector of the phrases (with Perl regular expressions) for which the function will parse through the GO database and search.

parseType

A character vector.This vector is in one to one correspondence with toGrep; it takes in the parse type such as "grep", "agrep", etc.

eCode

A character vector of evidence codes (see "http://www.geneontology.org/GO.evidence.shtml" for details). The function will disallow any protein inclusion in the protein complexes if they are not indexed by evidence code other than those found in eCode.

wantAllComplexes

A logical. If TRUE, the function will incorporate all GO children of the nodes found by term searches. In addition, children of node GO:0043234 (the protein complex node) will also be incorporated.

includedGOTerms

A character vector of GO terms that will be parsed regardless of the default and parseType parameters. In essence, these GO terms forced to be included.

not2BeIncluded

A character vector of GO terms that should not be parsed nor included in the output.

Value

"GO:XXXXXXX": A character vector containing proteins (not indexed by only eCode evidence codes) which make up protein complex "GO:XXXXXXXX"

Details

This function's generic operation is to parse the GO database and search for pre-determined or chosen terms. It returns a named list of chracter vectors where the names are GO id's from the CC ontoloy and the vectors consist of proteins corresponding to that particular GO id. Running this function has multiple combinations:

1. If the wantDefault parameter is TRUE, the function will agrep for "complex" and grep for "\Base\b" and "\Bsome\b".

2. If toGrep is not NULL, it will be a character vector with terms and perl regular expressions that are intended for searching in the GO database. NB - it toGrep is not NULL, then parseType should also not be NULL as the parseType indicates how each term should be searched.

3. parseType needs to be supplied if toGrep is not NULL. It is a character vector, either a single entry or of length equal to the length of toGrep, detailing how each term in toGrep will be parsed in the GO database. If only one term is supplied for parseType, then all the terms in toGrep will be parsed identically. Otherwise, the i-th term in parseType will reflect the parsing of the i-th term in toGrep.

4. The eCode argument is a user determined refining mechanism. It takes in a vector of evidence codes (as detailed by the GO website). The function will dis-allow proteins if and only if these proteins are only indexed by evidence codes found within eCodes.

5. If wantAllComplexes parameter is True, the function will also return the children of nodes found by parsing terms. In addition, the children of GO ID GD:0043234 (the protein complex ID) will be returned. The union of complexes is then returned.

References

www.geneontology.org

Examples

Run this code

#go = getGOInfo(wantAllComplexes = FALSE)
#goCoded = getGOInfo(code = c("IPI","ND","IDA"))
#goPhrase = getGOInfo(wantDefault = FALSE, toGrep = "\Bsomal\b",
#parseType = "grep", wantAllComplexes = FALSE)
#nam1 = names(go)
#nam2 = names(goCoded)
#if(length(nam1) == length(nam2) && nam1 == nam2){
#sapply(nam1, function(x) setdiff(go[[x]], goCoded[[x]]))
#}

Run the code above in your browser using DataLab