createExonByTranscriptCdf.AffymetrixCdfFile: Creates an exon-by-transcript CDF

Description

Creates an exon-by-transcript CDF based on the probesets defined in an "exon-only" CDF and transcript-exon mapping of a NetAffx probeset annotation data file.

Usage

## S3 method for class 'AffymetrixCdfFile':
createExonByTranscriptCdf(cdf, csv, tags=c("*"), path=getPath(cdf),
  type=c("all", "core", "extended", "full", "main", "control", "cds"), subsetBy=NULL,
  within=NULL, ..., overwrite=FALSE, verbose=FALSE)

Arguments

cdf

An AffymetrixCdfFile specifying an "exon-only" CDF, which defines the exon-specific probesets that will go into the new CDF. For more details, see below.

csv

An AffymetrixNetAffxCsvFile specifying the Affymetrix NetAffx CSV probeset annotation file that contains the transcript-exon mapping.

Value

Returns an AffymetrixCdfFile.

Requirements for the "exon-only" CDF

The template CDF - argument cdf - should be an "exon-only" CDF: each unit has one group/probeset, which is the exon. An example of this is the "unsupported" HuEx-1_0-st-v2.cdf from Affymetrix, which has 1,432,154 units. Such "exon-only" CDFs do not contain information about clustering exons/probesets into gene transcripts. The CDF may also contain a number of non-exon probesets corresponding to control probes, which can contain very large numbers of probes per probeset. Such units are dropped/ignored by this method.

Ordering of transcripts and exons within transcripts

The transcripts (=units) will be ordered as they appear in the NetAffx annotation file. Within each transcript (=unit), the exons (=groups) are ordered lexicographically by their names.

Naming of transcripts and exons

In the created CDF, each unit corresponds to one transcript cluster, and each group within a unit corresponds to the exons within that transcript cluster. Thus, the unit names correspond to the transcript cluster names and the group names correspond to the exon names.

The exon names are defined by unit names of the exon-only CDF, whereas the transcript names are defined by the transcriptClusterId column in the NetAffx annotation data file. These transcript and exon names are often "non-sense" integers. In order to map these to more genome-friendly names, use the various annotations available in the NetAffx annotation data file.

Examples

Run this code

# The exon-only CDF
 cdf <- AffymetrixCdfFile$byChipType("HuEx-1_0-st-v2");

 # The NetAffx probeset annotation data file
 csv <- AffymetrixNetAffxCsvFile("HuEx-1_0-st-v2.na24.hg18.probeset.csv", path=getPath(cdf));

 # Create a CDF containing all core probesets:
 cdfT <- createExonByTranscriptCdf(cdf, csv=csv, tags=c("*,HB20110911"));
 print(cdfT);

 # Create CDF containing the core probesets with 3 or 4 probes:
 cdfT2 <- createExonByTranscriptCdf(cdf, csv=csv,
             tags=c("*,bySize=3-4,HB20110911"),
             subsetBy="probeCount", within=c("3", "4"));
 print(cdfT2);

Run the code above in your browser using DataLab