createExonByTranscriptCdf.AffymetrixCdfFile: Creates an exon-by-transcript CDF
Description
Creates an exon-by-transcript CDF based on the probesets defined in an "exon-only" CDF
and transcript-exon mapping of a NetAffx probeset annotation data file.Usage
## S3 method for class 'AffymetrixCdfFile':
createExonByTranscriptCdf(cdf, csv, tags=c("*"), path=getPath(cdf),
type=c("all", "core", "extended", "full", "main", "control", "cds"), subsetBy=NULL,
within=NULL, ..., overwrite=FALSE, verbose=FALSE)
Arguments
cdf
An AffymetrixCdfFile
specifying
an "exon-only" CDF, which defines the exon-specific probesets
that will go into the new CDF. For more details, see below. csv
An AffymetrixNetAffxCsvFile
specifying the Affymetrix NetAffx CSV probeset annotation file
that contains the transcript-exon mapping. tags
Additional tags added to the filename of created CDF,
i.e. ,.cdf.
path
The output path where the custom CDF will be written.
type
A character
string specifying the type of CDF to be written. subsetBy
An optional character
specifying the name of a column
in the annotation file to subset against. The column will be parsed
as the data type of argument within
. within
A vector
of values accepted for the subsetBy
column. overwrite
If TRUE
, an existing CDF is overwritten. Requirements for the "exon-only" CDF
The template CDF - argument cdf
- should be an "exon-only" CDF:
each unit has one group/probeset, which is the exon.
An example of this is the "unsupported" HuEx-1_0-st-v2.cdf
from Affymetrix, which has 1,432,154 units.
Such "exon-only" CDFs do not contain information about clustering
exons/probesets into gene transcripts.
The CDF may also contain a number of non-exon probesets corresponding
to control probes, which can contain very large numbers of
probes per probeset. Such units are dropped/ignored by this method.Ordering of transcripts and exons within transcripts
The transcripts (=units) will be ordered as they appear in the
NetAffx annotation file.
Within each transcript (=unit), the exons (=groups) are ordered
lexicographically by their names.Naming of transcripts and exons
In the created CDF, each unit corresponds to one transcript cluster,
and each group within a unit corresponds to the exons within
that transcript cluster. Thus, the unit names correspond to the
transcript cluster names and the group names correspond to the
exon names. The exon names are defined by unit names of the exon-only CDF,
whereas the transcript names are defined by the
transcriptClusterId
column in the NetAffx annotation data file.
These transcript and exon names are often "non-sense" integers.
In order to map these to more genome-friendly names, use the various
annotations available in the NetAffx annotation data file.
Examples
Run this code# The exon-only CDF
cdf <- AffymetrixCdfFile$byChipType("HuEx-1_0-st-v2");
# The NetAffx probeset annotation data file
csv <- AffymetrixNetAffxCsvFile("HuEx-1_0-st-v2.na24.hg18.probeset.csv", path=getPath(cdf));
# Create a CDF containing all core probesets:
cdfT <- createExonByTranscriptCdf(cdf, csv=csv, tags=c("*,HB20110911"));
print(cdfT);
# Create CDF containing the core probesets with 3 or 4 probes:
cdfT2 <- createExonByTranscriptCdf(cdf, csv=csv,
tags=c("*,bySize=3-4,HB20110911"),
subsetBy="probeCount", within=c("3", "4"));
print(cdfT2);
Run the code above in your browser using DataLab