Obtaining the optimal p-value cut-off for individual tests to achieve a given Type I error level of obtaining disjoint components of the graph
choisir.seuil( n.genes,
taille.groupes = c( 10, 10 ),
alpha.cible = 0.05,
seuil.p = (5:30)/100,
B = 3000, conf.level = 0.95,
f.p = student.fpc, frm = R ~ Groupe,
normaliser = FALSE, en.log = TRUE,
n.quantifies = n.genes, masque,
n.coeurs = 1,
... )
choisir.seuil
returns a data.frame with four columns,
corresponding to the candidate cut-offs, the corresponding estimated
type-I error and its lower and upper confidence bounds, and attributes
giving the estimated optimal cut-off, its confidence interval and
details on simulation condition. This data.frame has the additional
class SARPcompo.H0
, allowing specific print
and
plot
methods to be used.
Number of genes to be quantified simultaneously
An integer vector containing the sample size
for each group. The number of groups is determined by the length of
this vector. Unused if masque
is provided.
The target type I error level of obtaining disjoint subnetworks under the null hypothesis that gene expressions are the same in all groups. Should be between 0 and 1.
A numeric vector of candidate cutoffs. Values outside the [0,1] interval are automatically removed. The default (from 0.05 to 0.30) is suited for a target type I error of 0.05 and less than 30 genes, roughly.
How many simulations to do.
The confidence level of the interval given as a result (see Details).
The function to use for individual tests of each ratio. See
creer.Mp
for details.
The formula to use. The default is suited for the structure of the simulated data, with R the ratio and Groupe the variable with group membership.
Should the simulated data by normalised, that is should their sum be equal to 1? Since ratio are insensitive to the normalisation (by contrast with individual quantities), it is a useless step for usual designs, hence the default.
If TRUE
, generated data are seen as log of quantities, hence the
normalisation step is performed after exponentiation of the data and
data are converted back in log.
The option is also used in the call of creer.Mp
.
The number of quantified genes amongst the
n.genes
simulated. Must be at most equal to n.genes
,
which is the default.
A data.frame containing the values of needed covariates
for all replicates. If missing, a one-column data.frame generated
using taille.groupes
, the column (named ‘Groupe’)
containing values ‘G1’ repeated taille.groupes[ 1 ]
times, ‘G2’ repeated taille.groupes[ 2 ]
times and so
on.
The number of CPU cores to use in computation, with parallelization using forks (does not work on Windows) with the help of the parallel package.
additional arguments, to be used by the analysis function f.p
The simulated ratios are stored in a column called R, appended to the simulated data.frame. For this reason, do not use any column of this name in the provided masque: it would be overwritten during the simulation process.
Emmanuel Curis (emmanuel.curis@parisdescartes.fr)
The choisir.seuil
function simulates B
datasets
of n.genes
“quantities” measured several times, under
the null hypothesis that there is only random variations between
samples. For each of these B
datasets, creer.Mp
is called with the provided test function, then converted to a graph
using in turn all cut-offs given in seuil.p
and the number of
components of the graph is determined. Having more than one is a type
I error.
For each cut-off in seuil.p
, the proportion of false-positive
is then determined, along with its confidence interval (using the
exact, binomial formula). The optimal cut-off to achieve the target
type I error is then found by linear interpolation.
Simulation is done assuming a log-normal distribution, with a
reduced, centered Gaussian on the log scale. Since under the null
hypothesis nothing changes between the groups, the only needed
informations is the total number of values for a given gene, which is
determined from the number of rows of masque
.
All columns of masque
are transfered to the analysis function,
so simulation under virtually any experimental design should be
possible, as far as a complete null hypothesis is wanted (not any
effect of any covariate).
creer.Mp
.
# What would be the optimal cut-off for 10 genes quantified in two
# groups of 5 replicates?
# For speed reason, only 50 simulations are done here,
# but obviously much more are needed to have a good estimate f the cut-off.
seuil <- choisir.seuil( 10, c( 5, 5 ), B = 50 )
seuil
# Get the cut-off and its confidence interval
attr( seuil, "seuil" )
# Plot the results
plot( seuil )
Run the code above in your browser using DataLab