choisir.seuil.equiv: Cut-off selection by simulations, in the context of equivalence tests

Description

Obtaining the optimal p-value cut-off for individual tests to achieve a given Type I error level of obtaining connected nodes in the graph

Usage

choisir.seuil.equiv( n.genes, taille.groupes,
                     mu = 10, sigma = 0.5, Delta = 0.5,
                     alpha.cible = 0.05,
                     seuil.p = (10:40)/100,
                     B = 3000, conf.level = 0.95,
                     f.p = equiv.fpc,
                     en.log = TRUE,
                     n.coeurs = 1,
                     ... )

Value

choisir.seuil.equiv returns a data.frame with four columns, corresponding to the candidate cut-offs, the corresponding estimated type-I error and its lower and upper confidence bounds, and attributes giving the estimated optimal cut-off, its confidence interval and details on simulation condition. This data.frame has the additional class SARPcompo.H0, allowing specific print

and plot methods to be used.

Arguments

n.genes

Number of genes to be quantified simultaneously

taille.groupes

An integer vector containing the sample size for each group. The number of groups is determined by the length of this vector. Unused if masque is provided.

mu

A numeric vector giving the mean amount for each component in the first condition, in the log scale (\(\mu\)\). If a single value is provided, it is used for each component. Otherwise, the length of the vector must be equal to the number of components.

It can also be a two-lines matrix giving the mean amounts for each component (columns) in the first (firt row) and second (second row) condition.

sigma

A numeric vector giving the standard deviation for the amount of each component in both conditions, in the log scale (\(\sigma\)\). If a single value is provided, it is used for each component. Otherwise, the length of the vector must be equal to the number of components.

It can also be a two-lines matrix giving the mean amounts for each component (columns) in the first (firt row) and second (second row) condition.

Delta

The limit for the equivalence region, \(\Delta\), in the log scale.

alpha.cible

The target type I error level of obtaining disjoint subnetworks under the null hypothesis that gene expressions are the same in all groups. Should be between 0 and 1.

seuil.p

A numeric vector of candidate cutoffs. Values outside the [0,1] interval are automatically removed. The default (from 0.05 to 0.30) is suited for a target type I error of 0.05 and less than 30 genes, roughly.

B

How many simulations to do.

conf.level

The confidence level of the interval given as a result (see Details).

f.p

The function to use for individual tests of each ratio. See creer.Mp for details.

en.log

If TRUE, generated data are seen as log of quantities. The option is used in the call of creer.Mp.

n.coeurs

The number of CPU cores to use in computation, with parallelization using forks (does not work on Windows) with the help of the parallel package.

...

additional arguments, to be used by the analysis function f.p

Author

Emmanuel Curis (emmanuel.curis@parisdescartes.fr)

Details

The choisir.seuil.equiv function simulates B datasets of n.genes “quantities” measured several times, under the null hypothesis that variations between samples of two conditions are given by the difference between the two rows of the \(\mu\) matrix. If \(\mu\) was given as a single row (or a single value), the second row is defined as \((\mu, \mu + \Delta, \mu + 2\Delta\dots)\) -- correspondong to the null hypothesis that all components have a different change between the two conditions, and that this change is equal to the equivalence region limit (\(\Delta\)). For each of these B datasets, creer.Mp is called with the provided test function, then converted to a graph using in turn all cut-offs given in seuil.p and the number of edges of the graph is determined. Having at least one edge is a type I error, since under the null hypothesis there is no couple of genes having the same change.

For each cut-off in seuil.p, the proportion of false-positive is then determined, along with its confidence interval (using the exact, binomial formula). The optimal cut-off to achieve the target type I error is then found by linear interpolation.

Data are generated using a normal (Gaussian) distribution, independantly for each component and each condition.

Examples

Run this code

   # What would be the optimal cut-off for 5 genes quantified in two
   #  groups of 5 replicates?
   # Null hypothesis : mean = 0, sd = 1, Delta = 2
   # For speed reason, only 50 simulations are done here,
   #  but obviously much more are needed to have a good estimate f the cut-off.

   seuil <- choisir.seuil.equiv( 5, c( 5, 5 ),
                                 mu = 1, sigma = 1, Delta = 1,
                                 B = 50 )
   seuil

   # Get the cut-off and its confidence interval
   attr( seuil, "seuil" )

   # Plot the results
   plot( seuil )

Run the code above in your browser using DataLab