seedsFinder: Evaluate some statistics on all genes in order to select those that can be used as seeds for searching the signatures.

Description

This function works on each column (gene expression level) of the geData and returns the test-value and p-value of the log-rank test, the bayesian information criterion value under the hypothesis that tha data are drawn from a single gaussian (bic1) and a mixture of two gaussians (bic2); at the end the clustering of the samples is added.

Usage

seedsFinder(cutoff = 1.95, evaluateBICs = TRUE, cpuCluster = NULL)

Arguments

cutoff

argument passed to the BICs() function.

evaluateBICs

flag to force the computation of the bayesian information criteria.

cpuCluster

If a parallel search is necessary, this variable has to be set to the output of NCPUS() function.

Value

The result of the function is a matrix having so many rows as ncol(geData) and 4+nrow(geData) rows.
column no.1: tValuetest-value of the log-rank test statistic under the null hypothesis that the two survival curves are equal (see details)
column no.2: pValuep-value corresponding to the test-value in column no.1; actuallly is 1-pchisq(tValue, df = 1)
column no.3: bic1value of the bayesian information criterion computed under the hypothesis that the data are drawn from a single gaussian
column no.4: bic1value of the bayesian information criterion computed under the hypothesis that the data are drawn from mixture of two gaussians
columns from no.5 to no.4+nrow(geData)result of the unbiased classification (see details)

Details

For each gene expression levels data an unbiased classification is performed resulting into two clusters coded by the values 0 and 1. The samples classified by 0 are those for which the mean is lower than that of the samples classified with 1. The classification method is the partitioning around medoids algorithm linked to the a leave-one-out re-classification strategy (see the pamUmbiased() function for further details). From the clusters two survival curves are estimated with the stData data and then tested for the null hypothesis of no difference among them (see the survdiff() function for further details) providing the tValue. The correponding pValue is given by 1-pchisq(tValue, df = 1). Two more indexes are computed, the bayesian information criteria under the hypotheses 1) the gene levels are from a univarite gaussians (bic1) and 2) the gene levels are from a mixture of two gaussians (bic2) (see the BICs() function for further details). The mixing coefficient is estimated from the classification as the fraction of samples classified as 1. The parameters of the gassians are robustly estimated.

Examples

Run this code

data(geNSCLC)
geData <- geNSCLC

data(stNSCLC)
stData <- stNSCLC

# here few genes and samples are considered to speed up the timing of the example.
# please, try 
# genesToUse <- which(apply(!is.na(geData), 2, sum)/nrow(geData) > 0.75)
# geData <- geData[, genesToUse]
# and comment stData <- stData[1:50, ]
genesToUse <- which(apply(!is.na(geData), 2, sum) == nrow(geData))
geData <- geData[, genesToUse]
geData <- geData[1:50, ]
stData <- stData[1:50, ]
dim(geData)

aMakeCluster <- makeCluster(2)
aSeedsFinder <- seedsFinder(cpuCluster = aMakeCluster)
head(aSeedsFinder)

Run the code above in your browser using DataLab