To get started
To get started, see:
-
readCelUnits
() - reads one or several Affymetrix
CEL file probeset by probeset.
-
readCel
() - reads an Affymetrix CEL file.
by probe.
-
readCdf
() - reads an Affymetrix CDF file.
by probe.
-
readCdfUnits
() - reads an Affymetrix CDF file unit by unit.
-
readCdfCellIndices
() - Like readCdfUnits()
, but returns cell indices only, which is often enough to read CEL files unit by unit.
-
applyCdfGroups
() - Re-arranges a CDF structure.
-
findCdf
() - Locates an Affymetrix CDF file by chip type. This page also describes how to setup default search path for CDF files.
Setting up the CDF search path
Some of the functions in this package search for CDF files automatically by scanning certain directories. To add directories to the default search path, see instructions in findCdf
().Future Work
Other Affymetrix files can be parsed using the Fusion SDK. Given
sufficient interest we will implement this, e.g. DAT files (image files).Running examples
In order to run the examples, data files must exists in the current
directory. Otherwise, the example scripts will do nothing. Most of
the examples requires a CDF file or a CEL file, or both. Make sure
the CDF file is of the same chip type as the CEL file. Affymetrix provides data sets of different types at
http://www.affymetrix.com/support/datasets.affx that can be
used. There are both small are very large data sets available.Tecnical details
This package implements an interface to the Fusion SDK from
Affymetrix.com. This SDK (software development kit) is an open source
library used for parsing the various files formats used by the
Affymetrix platform. The intention is to provide interfaces to most if not all file formats
which may be parsed using Fusion. The SDK supports parsing of all the different versions of a specific
fileformat. This means that ASCII, binary as well as the new binary
format (codename Calvin) used by Affymetrix is supported through a
single API. We also expect any future changes to the file formats to
be reflected in the SDK, and subsequently in this package. However, as the current Fusion SDK does not support compressed files,
neither does affxparser. This is in contrast to some of the
existing code in affy and relatives (see below for links). In general we aim to provide functions returning all information in
the respective files. Currently it seems that future Affymetrix chip
designs may consists of so many features that returning all
information will lead to an unnecessary overhead in the case a user
only wants access to a subset. We have tried to make this possible. For older file, certain entries in the files have been removed from
newer specifications, and the SDK does not provide utilities for
reading these entries. This includes eg. the FEAT column of CDF files. Currently the package as well as the Fusion SDK is in beta stage. Bugs
may be related to either codebase. We are very interested in users
being unable to compile/parse files using this library - this includes
users with custom chip designs. In addition, since we aim to return all information stored in the
file (and accessible using the Fusion SDK) we would like reports from
users being unable to do that. The efficiency of the underlying code may vary with the version of the
file being parsed. For example, we currently report the number of
outliers present in a CEL file when reading the header of the file
using readCelHeader
. In order to obtain this information
from text based CEL files (version 2), the entire file needs to be
read into memory. With version 3 of the file format, this information
is stored in the header. With the introduction of the Fusion SDK (and the next version of their
file formats) Affymetrix has made it possible to use multibyte
character sets. This implies that character information may be
inaccesible if the compiler used to compile the C++ code does not
support multibyte character sets (specifically we require that the R
installation has defined the macro SUPPORT_MCBS
in the
Rconfig.h
header file). For example GCC needs to be version 3.4
or greater on Solaris. In the info
subdirectory of the package installation,
information regarding changes to the Fusion SDK is stored, e.g.
pathname <- system.file("info", "changes2fusion.txt", package="affxparser")
file.show(pathname)
Acknowledgments
We would like to thanks Ken Simpson (WEHI, Melbourne) and
Seth Falcon (FHCRC, Seattle) for feedback and code contributions.License
The releases of this package is licensed under LGPL version 2.1 or
newer. This applies also to the Fusion SDK.