snowball: main function for Snowball analysis

Description

This is the main function to perform snowball analysis. It requires a minimum input with many default operating parameters set.

Usage

snowball(y, X, ncore = 1, d = 300, B = 10000, B.i = 2000,
  sample.n = 100, resample.method = c("sample", "none", "combn"),
  mode.resample = c("count.class", "flat", "percent.class"), k.resample = 1)

Arguments

a factor variable for mutation status

data.frame containing gene expression data. The columns of X should be aligned with y on samples

ncore

number of processors to use for parallel computation. Set ncore = 1 or NULL for non-parallel computation mode

the size of gene subset for gene level resampling. See references on $d$ in $X_d^x$

bootstrap size, which is $B$ in $J_n(x)$, defining the total number of gene subsets used to estimate $J_n$, $$J_n(x)=\frac{1}{B}\sum_{i=1}^{B}(\frac{1}{K}\sum_{j=1}^{K}\phi_n(g(X_{i,j}),\kappa))$$

B.i

bootstrap size deployed on each child job in parallel mode

sample.n

number of samples drawn from the subject level resampling, denoted as $K$ in $J_n(x)$. It is ignored if resample.method="none" or "combn"

resample.method

this defines how the subject level resampling is performed. The possible values are "sample", "none" and "combn". Let resample.method = "sample" for random sampling with replacement, "none"

mode.resample

this specifies how the subjects are counted for subject level leave-k-out random sampling, and whether the stratification by group is applied. The possible input values are "count.class", "percent.class" or "no"

k.resample

A numerical value specifies the number of subjects left out during the subject level resampling. It is an integer number if

mode.resample =
  "count.class"

and a numerical number between 0 and 1 if mode.resample = "percent.

Value

A data.frame containing two variables: weights and positives. weights are the $J_n(x)$ values for all genes and positives are indicators to whether a specific $J_n(x)$ is above or below the median of all $J_n(x)$'s.

References

Xu, Y., Guo, X., Sun, J. and Zhao. Z. Snowball: resampling combined with distance-based regression to discover transcriptional consequences of driver mutation, manuscript.

Examples

Run this code

require(DESnowball)
data(snowball.demoData)
# check the demo dataset
print(sb.mutation)
head(sb.expression)
## A test run
Bn <- 10000
ncore <-4
# call Snowball
sb <- snowball(y=sb.mutation,X=sb.expression,
	          ncore=ncore,d=100,B=Bn,
	          sample.n=1)
# process the gene ranking and selection
sb.sel <- select.features(sb)
# plot the Jn values
plotJn(sb, sb.sel)
# get the significant gene list
top.genes <- toplist(sb.sel)

Run the code above in your browser using DataLab