annotate (version 1.46.1)

readGEOAnn: Function to extract data from the GEO web site

Description

Data files that are available at GEO web site are identified by GEO accession numbers. Given the url for the CGI script at GEO and a GEO accession number, the functions extract data from the web site and returns a matrix containing the data.

Usage

readGEOAnn(GEOAccNum, url = "http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?") readIDNAcc(GEOAccNum, url = "http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?") getGPLNames(url ="http://www.ncbi.nlm.nih.gov/geo/query/browse.cgi?") getSAGEFileInfo(url = "http://www.ncbi.nlm.nih.gov/geo/query/browse.cgi?view=platforms&prtype=SAGE&dtype=SAGE") getSAGEGPL(organism = "Homo sapiens", enzyme = c("NlaIII", "Sau3A")) readUrl(url)

Arguments

url
url the url for the CGI script at GEO
GEOAccNum
GEOAccNum a character string for the GEO accession number of a desired file (e. g. GPL97)
organism
organism a character string for the name of the organism of interests
enzyme
enzyme a character string that can be eighter NlaII or Sau3A for the enzyme used to create SAGE tags

Value

Both readGEOAnn and readIDNAcc return a matrix.getGPLNames returns a named vector of the names of commercial arrays. The names of the vector are the corresponding GEO accession number.

Details

url is the CGI script that processes user's request. readGEOAnn invokes the CGI by passing a GEO accession number and then processes the data file obtained.

readIDNAcc calls readGEOAnn to read the data and the extracts the columns for probe ids and accession numbers. The GEOAccNum has to be the id for an Affymetrix chip.

getGPLNames parses the html file that lists GEO accession numbers and descriptions of the array represented by the corresponding GEO accession numbers.

References

www.ncbi.nlm.nih.gov/geo

Examples

Run this code
# Get array names and GEO accession numbers
#geoAccNums <- getGPLNames()
# Read the annotation data file for HG-U133A which is GPL96 based on
# examining geoAccNums 
#temp <- readGEOAnn(GEOAccNum = "GPL96")
#temp2 <- readIDNAcc(GEOAccNum = "GPL96")

Run the code above in your browser using DataCamp Workspace