In the simplest case, we have gene expression data on one "base"
sample and one "perturbed" sample, and the goal is to identify genes
whose expression changes between the two states. Our primary
assumption is that the standard deviation (SD) of gene expression
varies as a smooth function of the mean; fitting such a curve allows
us to detect individual genes whose difference is large compared to
the smoothed SD.
Note that this assumption is most useful on the log-transformed
scale (https://pubmed.ncbi.nlm.nih.gov/25092958/).
If your data is on a raw scale, then we recommend transforming
it before computing the Newman paired statistic.
The input arguments to the pairedStats function are moderately
complicated in order to allow users to choose a convenient method for
supplying data when they have multiple paired samples. The first
posssibility is to have all the base samples in one matrix and all the
perturbed samples in a second matrix. In this case, we assume (without
checking) that the columns in the two matrices correspond to the
paired samples, and that the genes-rows are in the same order.
The second possibility is to put the data for both the base samples
and the perturbed samples in the same matrix. In this case, the user
must supply a pairing vector to explain how the samples should
be matched. If the column order is ("base1", "perturbed1", "base2",
"perturbed2", ...), then the pairiing vector should be written as
c(-1, 1, -2, 2, -3, 3, ...).
The third possibility is to provide the paired samples in a list,
each of whose entries is a matrix with two columns,with the first
column being the base state and the second column being the
perturbed state.
This flexibility means that there are three equivalent ways to input
the data even if you have only one base sample (with data in the
one-column matrix B) and one perturbed sample (with data in the
one-column matrix P). If we let BP <- cbind(B, P) , then we can
choose (1) pairedStats(B, P), or (2)
pairedStats(list(BP)), or (3) pairedStats(BP,
pairing = c(-1,1)).