ppa: The Ping-Pong Algorithm

Description

Run the PPA with the default parameters

Usage

# S4 method for list
ppa(data, ...)

Value

A named list is returned with the following elements:

rows1

The first components of the co-modules, corresponding to the rows of the first input matrix. Every column corresponds to a co-module, if an element (the score of the row) is non-zero, that means that that component is included in the co-module, otherwise it is not. Scores are between -1 and 1. If two scores have the same non-zero sign, then the corresponding first matrix rows are collelated. If they have an opposite sign, then they are anti-correlated.

If an input seed did not converge within the allowed number of iterations, then that column of rows1 contains NA values. The ppa function does not produce such columns, because it always drops the non-convergent seeds via a call to ppa.unique. The result of the ppa.iterate function might contain such columns, though.

rows2

This is the same as rows1, but for the second input matrix.

columns

The same as rows1 and rows2, but for the columns of both input matrices.

seeddata

A data frame containing information about the co-modules. There is one row for each co-module. The data frame has the following columns:

iterations: The number of iterations needed for convergence.

oscillation

The oscillation cycle of this is oscillating co-module. Zero otherwise.

thr.row1

The threshold used for the rows of the first matrix.

thr.row2

The threshold used for the rows of the second matrix.

thr.col

The threshold used for the common column dimension.

freq

Numeric scalar, the number of times the same (or a very similar) co-module was found.

rob

The robustness score of the module.

rob.limit

The robustness limit that was used to filter the module. See ppa.filter.robust for details.

rundata

A named list with information about the PPA run. It has the following entries:

direction: Character vector of length four, the direction argument of the ppa.iterate call.

convergence

Character scalar, the convergence criteria that was used, see the ppa.iterate function for details.

cor.limit

Numeric scalar, the correlation threshold, that was used if the convergence criteria was ‘cor’.

maxiter

The maximum number of PPA iterations.

N

The total number of input seeds that were used to find the co-modules.

prenormalize

Logical scalar, whether the input matrices were pre-normalized, see ppa.normalize for details.

hasNA

Logical vector of length two. Whether the two input matrices contained any NA or NaN values.

unique

Logical scalar, whether the co-modules are unique, i.e. whether ppa.unique was called.

oscillation

Logical scalar, whether the ppa.iterate run looked for oscillating modules.

rob.perms

The number of data permutations that was performed during the robustness filtering, see ppa.filter.robust for details.

Arguments

data: The input, a list of two numeric matrices, with the same number of columns. They may contain NA and/or NaN values, but then the algorithm might get slower, as R matrix multiplication is slower sometimes slower for these matrices, depending on your platform.
...: Additional arguments, see details below.

Author

Gabor Csardi Gabor.Csardi@unil.ch

Details

Please read the isa2-package manual page for and introductino on ISA and PPA.

This function can be called as


    ppa(data, thr.row1 = seq(1, 3, by = 0.5),
        thr.row2 = seq(1, 3, by = 0.5),
        thr.col = seq(1, 3, by = 0.5),
        no.seeds = 100, direction = "updown")

where the arguments are:

data: The input, a list of two numeric matrices, with the same number of columns. They may contain NA and/or NaN values, but then the algorithm might get slower, as R matrix multiplication is slower sometimes slower for these matrices, depending on your platform.
thr.row1: Numeric scalar or vector giving the threshold parameter for the rows of the first matrix. Higher values indicate a more stringent threshold and the result comodules will contain less rows for the first matrix on average. The threshold is measured by the number of standard deviations from the mean, over the values of the first row vector. If it is a vector then it must contain an entry for each seed.
thr.row2: Numeric scalar or vector, the threshold parameter(s) for the rows of the second matrix. See thr.row1 for details.
thr.col: Numeric scalar or vector giving the threshold parameter for the columns of both matrices. The analogue of thr.row1.
no.seeds: Integer scalar, the number of random seeds to use.
direction: Character vector of length four, one for each matrix multiplication performed during a PPA iteration. It specifies whether we are interested in rows/columns that are higher (‘up’) than average, lower than average (‘down’), or both (‘updown’). The first and the second entry both corresponds to the common column dimension of the two matrices, so they should be equal, otherwise a warning is given.

The ppa function provides and easy interface to the PPA. It runs all sptes of a typical PPA work flow, with their default paramers.

This involves:

Normalizing the input matrices by calling ppa.normalize.
Generating random input seeds via generate.seeds.
Running the PPA with all combinations of the given row1, row2 and column thresholds (by default 1, 1.5, 2, 2.5, 3); by calling ppa.iterate.
Merging similar co-modules, separately for each threshold combination, by calling ppa.unique.
Filtering the co-modules separately for each threshold combination, by calling ppa.filter.robust.
Putting all co-modules from the run with different thresholds, into a single object.
Merging similar co-modules, again, but now across all threshold combinations. If two co-modules are similar, then the larger one, the one with milder thresholds is kept.

Please see the manual pages of these functions for the details.

References

Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.

Examples

Run this code

## WE do not run this, it takes relatively long
if (FALSE) {
data <- ppa.in.silico(noise=0.1)
ppa.result <- ppa(data[1:2], direction="up")

## Find the best bicluster for each block in the input
## (based on the rows of the first input matrix)
best <- apply(cor(ppa.result$rows1, data[[3]]), 2, which.max)

## Check correlation
sapply(seq_along(best),
       function(x) cor(ppa.result$rows1[,best[x]], data[[3]][,x]))

## The same for the rows of the second matrix
sapply(seq_along(best),
       function(x) cor(ppa.result$rows2[,best[x]], data[[4]][,x]))

## The same for the columns
sapply(seq_along(best),
       function(x) cor(ppa.result$columns[,best[x]], data[[5]][,x]))

## Plot the data and the modules found
if (interactive()) {
  layout(rbind(1:2,c(3,6),c(4,7), c(5,8)))
  image(data[[1]], main="In-silico data, first matrix")
  image(data[[2]], main="In-silico data, second matrix")
  sapply(best[1:3], function(b) image(outer(ppa.result$rows1[,b],
                                       ppa.result$columns[,b]),
                                 main=paste("Module", b)))  
  sapply(best[1:3], function(b) image(outer(ppa.result$rows2[,b],
                                       ppa.result$columns[,b]),
                                 main=paste("Module", b)))  
}
}

Run the code above in your browser using DataLab