Obtaining the optimal p-value cut-off for individual tests to achieve a given Type I error level of obtaining connected nodes in the graph
choisir.seuil.equiv( n.genes, taille.groupes,
mu = 10, sigma = 0.5, Delta = 0.5,
alpha.cible = 0.05,
seuil.p = (10:40)/100,
B = 3000, conf.level = 0.95,
f.p = equiv.fpc,
en.log = TRUE,
n.coeurs = 1,
... )
choisir.seuil.equiv
returns a data.frame with four
columns, corresponding to the candidate cut-offs, the corresponding
estimated type-I error and its lower and upper confidence bounds, and
attributes giving the estimated optimal cut-off, its confidence
interval and details on simulation condition. This data.frame has the
additional class SARPcompo.H0
, allowing specific print
and plot
methods to be used.
Number of genes to be quantified simultaneously
An integer vector containing the sample size
for each group. The number of groups is determined by the length of
this vector. Unused if masque
is provided.
A numeric vector giving the mean amount for each component in the first condition, in the log scale (\(\mu\)\). If a single value is provided, it is used for each component. Otherwise, the length of the vector must be equal to the number of components.
It can also be a two-lines matrix giving the mean amounts for each component (columns) in the first (firt row) and second (second row) condition.
A numeric vector giving the standard deviation for the amount of each component in both conditions, in the log scale (\(\sigma\)\). If a single value is provided, it is used for each component. Otherwise, the length of the vector must be equal to the number of components.
It can also be a two-lines matrix giving the mean amounts for each component (columns) in the first (firt row) and second (second row) condition.
The limit for the equivalence region, \(\Delta\), in the log scale.
The target type I error level of obtaining disjoint subnetworks under the null hypothesis that gene expressions are the same in all groups. Should be between 0 and 1.
A numeric vector of candidate cutoffs. Values outside the [0,1] interval are automatically removed. The default (from 0.05 to 0.30) is suited for a target type I error of 0.05 and less than 30 genes, roughly.
How many simulations to do.
The confidence level of the interval given as a result (see Details).
The function to use for individual tests of each
ratio. See creer.Mp
for details.
If TRUE
, generated data are seen as log of
quantities. The option is used in the call of
creer.Mp
.
The number of CPU cores to use in computation, with parallelization using forks (does not work on Windows) with the help of the parallel package.
additional arguments, to be used by the analysis function f.p
Emmanuel Curis (emmanuel.curis@parisdescartes.fr)
The choisir.seuil.equiv
function simulates B
datasets of n.genes
“quantities” measured several times,
under the null hypothesis that variations between samples of two
conditions are given by the difference between the two rows of the
\(\mu\) matrix. If \(\mu\) was given as a single row (or a single
value), the second row is defined as \((\mu, \mu + \Delta, \mu +
2\Delta\dots)\) -- correspondong to the null hypothesis that all
components have a different change between the two conditions, and
that this change is equal to the equivalence region limit
(\(\Delta\)). For each of these B
datasets,
creer.Mp
is called with the provided test function, then
converted to a graph using in turn all cut-offs given in
seuil.p
and the number of edges of the graph is
determined. Having at least one edge is a type I error, since under
the null hypothesis there is no couple of genes having the same
change.
For each cut-off in seuil.p
, the proportion of false-positive
is then determined, along with its confidence interval (using the
exact, binomial formula). The optimal cut-off to achieve the target
type I error is then found by linear interpolation.
Data are generated using a normal (Gaussian) distribution, independantly for each component and each condition.
creer.Mp
, equiv.fpc
.
See choisir.seuil
for the case of difference tests and
disjoing subgraphs.
# What would be the optimal cut-off for 5 genes quantified in two
# groups of 5 replicates?
# Null hypothesis : mean = 0, sd = 1, Delta = 2
# For speed reason, only 50 simulations are done here,
# but obviously much more are needed to have a good estimate f the cut-off.
seuil <- choisir.seuil.equiv( 5, c( 5, 5 ),
mu = 1, sigma = 1, Delta = 1,
B = 50 )
seuil
# Get the cut-off and its confidence interval
attr( seuil, "seuil" )
# Plot the results
plot( seuil )
Run the code above in your browser using DataLab