This package provides methods to create read and write (cell-index) maps from Affymetrix CDF files. These can be used to store the cell data in an optimal order so that when data is read it is read in contiguous blocks, which is faster.
In addition to this, read maps may also be used to read CEL files that
have been "reshuffled" by other software. For instance, the dChip
software (
For more details how cell indices are defined, see
2. Cell coordinates and cell indices
.
In Affymetrix CEL files, cell data is stored in order of cell indices.
Moreover, (except for a few early chip types) Affymetrix randomizes
the locations of the cells such that cells in the same unit (probeset)
are scattered across the array.
Thus, when reading CEL data arranged by units using for instance
readCelUnits
(), the order of the cells requested is both random
and scattered.
Since CEL data is often queried unit by unit (except for some
probe-level normalization methods), one can improve the speed of
reading data by saving data such that cells in the same unit are
stored together. A write map is used to remap cell indices
to file indices. When later reading that data back, a
read map is used to remap file indices to cell indices.
Read and write maps are described next.
integer
vector
s of length $N*K$ with unique elements in
${1,2,...,N*K}$.
Consider cell and file indices as in previous section. For example, the "reversing" read map in previous section can be
represented as
readMap <- (N*K):1
Given a vector
j
of file indices, the cell indices are
the obtained as i = readMap[j]
.
The corresponding write map is
writeMap <- (N*K):1
and given a vector
i
of cell indices, the file indices are
the obtained as j = writeMap[i]
.
Note also that the bijective property holds for this mapping, that is
i == readMap[writeMap[i]]
and i == writeMap[readMap[i]]
are both TRUE
.
Because the mapping is bijective, the write map can be calculated from
the read map by:
writeMap <- order(readMap)
and vice versa:
readMap <- order(writeMap)
Note, the invertMap
() method is much faster than order()
.
Since most algorithms for Affymetrix data are based on probeset (unit)
models, it is natural to read data unit by unit. Thus, to optimize the
speed, cells should be stored in contiguous blocks of units.
The methods readCdfUnitsWriteMap
() can be used to generate a
write map from a CDF file such that if the units are read in
order, readCelUnits
() will read the cells data in order.
Example:
Find any CDF file
cdfFile <- findCdf()
# Get the order of cell indices indices <- readCdfCellIndices(cdfFile) indices <- unlist(indices, use.names=FALSE)
# Get an optimal write map for the CDF file writeMap <- readCdfUnitsWriteMap(cdfFile)
# Get the read map readMap <- invertMap(writeMap)
# Validate correctness indices2 <- readMap[indices] # == 1, 2, 3, ..., N*K
Warning, do not misunderstand this example. It can not be used improve the reading speed of default CEL files. For this, the data in the CEL files has to be rearranged (by the corresponding write map).
Thus, to read this data "unrotated", use the following read map: readMap <- invertMap(writeMap) data <- readCel(celFile, indices=1:10, readMap=readMap)