prepareData: Combining Two Studies into an Expression Set

Description

The function prepares a collection of two expression sets (ExpressionSet) and/or Affy batches (AffyBatch) to be passed on to the main function OrderedList. For each data set, one has to specify the variable in the corresponding phenodata from which the grouping into two distinct classes is done. The data sets are then merged into one ExpressionSet together with the rearranged phenodata. If the studies were done on different platforms but a subset of genes can be mapped from one chip to the other, this information can be provided via the mapping argument.

Please note that both data sets have to be pre-processed beforehand, either together or independent of each other. In addition, the gene expression values have to be on an additive scale, that is logarithmic or log-like scale.

Usage

prepareData(eset1, eset2, mapping = NULL)

Arguments

eset1

The main inputs are the distinct studies. Each study is stored in a named list, which has five elements: data, name, var, out and paired, see details below.

eset2

Same as eset2 for the second data set.

mapping

Data frame containing one named vector for each study. The vectors are comprised of probe IDs that fit to the rownames of the corresponding expression set. For each study, the IDs are ordered identically. For example, the $k$th row of mapping provides the label of the $k$th gene in each single study. If all studies were done on the same chip, no mapping is needed (default).

Value

An object of class ExpressionSet containing the joint data sets with appropriate phenodata.

Details

Each study has to be stored in a list with five elements:

data

Object of class ExpressionSet or AffyBatch.

name

Character string with comparison label.

var

Character string with phenodata variable. Based on this variable, the samples for the two-sample testing will be extracted.

out

Vector of two character strings with the levels of var that define the two clinical classes. The order of the two levels must be identical for all studies. Ideally, the first entry corresponds to the bad and the second one to the good outcome level.

References

Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.

Examples

Run this code

data(OL.data)

### 'map' contains the appropriate mapping between 'breast' and 'prostate' IDs.
### Let's first concatenate two studies.
A <- prepareData(
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 mapping=OL.data$map
                 )

### We might want to examine the first 100 probes only.
B <- prepareData(
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 mapping=OL.data$map[1:100,]
                 )

Run the code above in your browser using DataLab