Learn R Programming

interventionalDBN (version 1.2.2)

formatData: Format a microarray spreadsheet ready for interventional network inference function

Description

This function formats a microarray timecourse dataset ready for the interventionalInference function.

Usage

formatData(d, cellLines = NULL, inhibitors = NULL, stimuli = NULL, times = NULL, nodes = NULL, intercept = TRUE, initialIntercept = TRUE, gradients = FALSE)

Arguments

d
A microarray spreadsheet, a $samples$ by (4 + $P$) matrix, where $P$ is the number of measurements for each sample. Column 1 gives the cell line in each sample. Column 2 gives the inhibitor used in each sample. Column 3 gives the stimulus used in each sample. Column 4 gives the time each sample was measured.
cellLines
A vector specifying a subset of cell lines to analyse (if absent, they are all used).
inhibitors
A vector specifying a subset of the inhibitors to analyse (if absent, they are all used).
stimuli
A vector specifying a subset of the stimuli to analyse (if absent, they are all used).
times
A vector specifying a subset of the times to analyse as the response (if absent, they are all used).
nodes
A vector specifying the indices of a subset of nodes to include in the analysis. Further nodes can be removed from the response in the interventionalInferenceDBN function.
intercept
A logical value indicating whether an intercept parameter should be included in all models.
initialIntercept
A logical value indicating whether an intercept parameter should be used to estimate the level at the initial timepoint. Only used if the initial timepoint is in the response.
gradients
A logical value indicating whether the concentraion gradient should be used as the response instead of the raw concentrations. This model has parallels with a dynamical systems viewpoint, and requires the covariance matrix to be adjusted. See Sigma.

Value

y
The $n$ by $P$ response matrix, where $n$ is the number of observations in the response. Not necesarily the same as the number of samples.
X0
The $n$ by $a$ design matrix of predcitors to be included in all models. Usually the intercept and zero intercept (if present).
X1
The $n$ by $P$ design matrix of predictors to undergo model selection.
Sigma
The $n$ by $n$ covariance matrix for a single column of y (proportional to $\sigma^2$). The identity matrix, unless gradients is TRUE.
sampleInfo
An $n$ by 4 matrix giving the cell line, inhibitor, stimulus and timepoint for each observation used in the response.
interpolated
A matrix similar to sampleInfo, giving the particulars of any observations for which the predictors were interpolated. Empty if no interpolation has been used.
cond
A vector indexing the experimental conditions, given by the cell line, inhibitor and stimulus used in each sample.

Details

The entries of column 4 of d must be real numbers. Missing values are acceptable and are handled as follows:
  1. Missing values in the response are ignored.
  2. For the predictors, if a single timepoint is missing, the predictors are interpolated from the two immediate neighbours.
  3. If one of the two immediate neighbours is missing then the response is ignored.
  4. UNLESS the predictor in question is for the initial observation (which is always missing), in which case 0 is returned, so that the level at zero can be estimated by a second intercept parameter in the interventionalInferenceDBN function.

See Also

interventionalInference, interventionalInferenceAdvanced, interventionalDBN-package, interventionEffects

Examples

Run this code
data(interventionalData)
# Load your own data spreadsheet using myData<-read.csv("myDataFile.csv").

# Use everything
fullData <- formatData(interventionalData)

# Use only DMSO and EGFRi samples.
halfData <- formatData(interventionalData,inhibitors=c("DMSO","EGFRi"))

# Produce gradients as response
diffData <- formatData(interventionalData,gradients=TRUE,initialIntercept=FALSE)
# Different results if we use the time between observations, rather than the timepoint.
interventionalData[,4]<-rep(c(0,5,10,20,30,60,90,120),4)
diffData2 <- formatData(interventionalData,gradients=TRUE,initialIntercept=FALSE)

# When there is missing data, interpolation also uses the time differences. 
missingData <- interventionalData[-4,]
fullData2 <- formatData(missingData)

Run the code above in your browser using DataLab