compData
objectcompData
class is used to store information about the experiment, such as the count matrix, sample and variable annotations, information regarding the generation of the data and results from applying a differential expression analysis to the data. This constructor function creates a compData
object.
compData(count.matrix, sample.annotations, info.parameters, variable.annotations = data.frame(), filtering = "no info", analysis.date = "", package.version = "", method.names = list(), code = "", result.table = data.frame())
count.matrix
.dataset
and uID
, but it may contain entries such as the ones listed below (see generateSyntheticData
for more detailed information about each of these entries).
dataset
: an informative name or identifier of the data set (e.g., summarizing the simulation settings).
samples.per.cond
n.diffexp
repl.id
seqdepth
minfact
maxfact
fraction.upregulated
between.group.diffdisp
filter.threshold.total
filter.threshold.mediancpm
fraction.non.overdispersed
random.outlier.high.prob
random.outlier.low.prob
single.outlier.high.prob
single.outlier.low.prob
effect.size
uID
: a unique ID for the data set. In contrast to dataset
, the uID
is unique e.g. for each instance of replicated data sets generated with the same simulation settings.
count.matrix
, that is, the number of variables in the data set). Not mandatory, but may contain columns such as the ones listed below. If present, the row names should be the same as the row names of the count.matrix
.
truedispersions.S1
: the true dispersion for each gene in condition S1.
truedispersions.S2
: the true dispersion for each gene in condition S2.
truemeans.S1
: the true mean value for each gene in condition S1.
truemeans.S2
: the true mean value for each gene in condition S2.
n.random.outliers.up.S1
: the number of 'random' outliers with extremely high counts for each gene in condition S1.
n.random.outliers.up.S2
: the number of 'random' outliers with extremely high counts for each gene in condition S2.
n.random.outliers.down.S1
: the number of 'random' outliers with extremely low counts for each gene in condition S1.
n.random.outliers.down.S2
: the number of 'random' outliers with extremely low counts for each gene in condition S2.
n.single.outliers.up.S1
: the number of 'single' outliers with extremely high counts for each gene in condition S1.
n.single.outliers.up.S2
: the number of 'single' outliers with extremely high counts for each gene in condition S2.
n.single.outliers.down.S1
: the number of 'single' outliers with extremely low counts for each gene in condition S1.
n.single.outliers.down.S2
: the number of 'single' outliers with extremely low counts for each gene in condition S2.
M.value
: the M-value (observed log2 fold change between condition S1 and condition S2) for each gene.
A.value
: the A-value (observed average expression level across condition S1 and condition S2) for each gene.
truelog2foldchanges
: the true (simulated) log2 fold changes between condition S1 and condition S2.
upregulation
: a binary vector indicating which genes are simulated to be upregulated in condition S2 compared to condition S1.
downregulation
: a binary vector indicating which genes are simulated to be downregulated in condition S2 compared to condition S1.
differential.expression
: a binary vector indicating which genes are simulated to be differentially expressed in condition S2 compared to condition S1.
full.name
and short.name
, giving the full name of the differential expression method (may including version number and parameter settings) and a short name or abbreviation.generateCodeHTMLs
function.count.matrix
and if present, the row names should be identical. The only mandatory column is score
, which gives a score for each gene, where a higher score suggests a "more highly differentially expressed" gene. Different comparison functions use different columns of this table, if available. The list below gives the columns that are used by the interfaced methods.
pvalue
nominal p-values
adjpvalue
p-values adjusted for multiple comparisons
logFC
estimated log-fold changes between the two conditions
score
the score that will be used to rank the genes in order of significance. Note that high scores always signify differential expression, that is, a strong association with the predictor. For example, for methods returning a nominal p-value the score can be defined as 1 - pvalue.
FDR
false discovery rate estimates
posterior.DE
posterior probabilities of differential expression
prob.DE
conditional probabilities of differential expression
lfdr
local false discovery rates
statistic
test statistics from the differential expression analysis
dispersion.S1
dispersion estimates in condition S1
dispersion.S2
dispersion estimates in condition S2
compData
object.
count.matrix <- round(matrix(1000*runif(4000), 1000))
sample.annotations <- data.frame(condition = c(1, 1, 2, 2))
info.parameters <- list(dataset = "mydata", uID = "123456")
cpd <- compData(count.matrix, sample.annotations, info.parameters)
Run the code above in your browser using DataLab