findCopyNumber(x, minGenes = 15, B = 100, p.adjust.method = "BH",
pvalcutoff = 0.05, exprScorecutoff = NA, mc.cores = 1, useAllPerm = F,
genome = "hg19", chrLengths, sampleGenome = TRUE, useOneChr = FALSE,
useIntegrate = TRUE,plot=TRUE,minGenesPerChr=100)
data.frame
with gene or probe identifiers as
row names and the following columns: es (the enrichment score), chr (the
chromosome where the gene or probe belong to) and pos (position in the
chromosome in megabases).
It can be obtained (from an epheno object) with the function
getEsPositions.
useAllPerm
is FALSE this value has to be bigger than 100.
If useAllPerm
is TRUE the computations are much more
expensive, therefore it is not recommended to use a B bigger than 100.
mc.cores
is
bigger than 1 the multicore
library has to be loaded.
We recommend to use the option FALSE after having observed that the enrichment can depend on the number of genes that are in the area.
We recommend to use the option TRUE if the positions of the enrichment
score are equidistant. Take into account that this option is much slower
and needs less permutations, therefore a smaller B
is preferred.
See details for more info.
numeric
containing chromosome names as names.
This names have to be the same as the ones used in x$chr
If missing the last position is used.
integrate
or pnorm
to compute
pvalues. The first does not assume any distribution for the
distribution under the null hypothesis, the second assumes it is
normally distributed.
minGenesPerChr
will be removed
from the analysis.
data.frame
containing the positions of the
enriched regions. This output can be passed by to the genesInArea
function to obtain the names of the genes that are in each region.
We assessed statistical significance by permuting the positions thrue
the hole genome.
If useAllPerm
is FALSE for each gene the permutations of genes
that are in an area with similar density (distance to tenth gene) are
used to compute pvalues. We observed that genes with similar densities
tend to have similar smoothed scores.
If we set 1000 permutations (B
=1000) scores are permuted thrue
the hole genome 10 times (1000/100). For each smoothed scored the
permutations of the 100 smoothed scores with most similar density
(distance to tenth gene) are used. Therefore each smoothed score will be
compared to 1000 smoothed scores obtained from permutations.
If scores are at the same distance in the genome from each other (for
instance when we have a score every fixed certain bases) the option
useAllPerm
=TRUE is recommended. In this case every smoothed score
is compared to all smoothed scores obtained via permutations.
In this case having 20,000 genes and setting the paramter B=10
would mean that the scores are permuted 10 times times thrue the hole
genome, obtaining 200,000 permuted smoothed scores. Each observed smoothed
score will be tested against the distribution of the 200,000 permuted
smoothed scores.
Only regions with as many genes as told in minGenes
being
statistically significant (pvalue lower than parameter
pvalcutoff
) after adjusting pvalues with the method specified in
p.adjust.method
will be selected as enriched.
If exprScorecutoff
is different form NA, a gene to be
statistically significant will need (aditionally to the pvalue cutoff)
to have a smoothed score bigger (lower if exprScorecutoff
is
negative) than the specified value.
data(epheno)
phenoNames(epheno)
mypos <- getEsPositions(epheno,'Relapse')
mypos$chr <- '1' #we set all probes to chr one for illustration purposes
#(we want a minimum number of probes per chromosome)
head(mypos)
set.seed(1)
regions <- findCopyNumber(mypos,B=10,plot=FALSE)
head(regions)
Run the code above in your browser using DataLab