Learn R Programming

JRF (version 0.1-2)

JRF: Joint Random Forest for the simultaneous estimation of multiple related networks

Description

Algorithm for the simultaneous estimation of multiple related networks. This file is a modified version of function RF contained in the R package randomForest.

Usage

JRF(x, y = NULL, xtest = NULL, ytest = NULL, ntree, sampsize,
  totsize = if (replace) ncol(x) else ceiling(0.632 * ncol(x)), mtry = if
  (!is.null(y) && !is.factor(y)) max(floor(nrow(x)/3), 1) else
  floor(sqrt(nrow(x))), replace = TRUE, classwt = NULL, cutoff, strata,
  nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, maxnodes = NULL,
  importance = FALSE, localImp = FALSE, nPerm = 1, proximity,
  oob.prox = proximity, norm.votes = TRUE, do.trace = FALSE,
  keep.forest = !is.null(y) && is.null(xtest), corr.bias = FALSE,
  keep.inbag = FALSE, nclasses, ...)

Arguments

x
numeric matrix with C * r rows and n columns, where C is the number of networks to estimate, r the number of predictors and n the maximum sample size across classes. Therefore, rows correspond to predict
y
numeric matrix C by n, where C is the number of networks to estimate and n the maximum sample size across classes. Therefore, rows correspond to response variables for each class and columns correspond t
nclasses
numeric value: the total number of classes C.
ntree
numeric value: number of trees.
sampsize
numeric vector C by 1: number of samples for each class of data.
importance
Should importance of predictors be assessed?
totsize
Max number of samples across different classes
mtry
numeric value: number of predictors to be sampled at each node.
xtest
a data frame or matrix (like x) containing predictors for the test set.
ytest
response for the test set.
replace
Should sampling of cases be done with or without replacement?
classwt
Priors of the classes. Need not add up to one. Ignored for regression.
cutoff
(Classification only) A vector of length equal to number of classes. The winning class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes
strata
A (factor) variable that is used for stratified sampling.
nodesize
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Note that the default values are different for classification (1) and regression (5).
maxnodes
Maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (subject to limits by nodesize ). If set larger than maximum possible, a warning is issued.
localImp
Should casewise importance measure be computed?
nPerm
Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effectiv. Currently only implemented for regression.
proximity
Should proximity measure among the rows be calculated?
oob.prox
Should proximity be calculated only on out-of-bag data?
norm.votes
If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs). Ignored for regression.
do.trace
If set to TRUE, give a more verbose output as randomForest is run. If set to some integer, then running output is printed for every do.trace trees.
keep.forest
If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
corr.bias
perform bias correction for regression? Note: Experimental. Use at your own risk.
keep.inbag
Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees (but not how many times, if sampling with replacement)
...
optional parameters to be passed to the low level function

Value

  • out object of class JRF

References

Petralia, F., Song, WM., Tu, Z. and Wang, P., A New Method for Joint Network Analysis Reveals Common and Different Co-Expression Patterns Among Genes and Proteins in Breast Cancer, submitted

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2, 18--22.

Examples

Run this code
# --- Derive weighted networks via JRF

nclasses=2               # number of data sets / classes
n1<-n2<-50               # sample size for each data sets
p<-100                   # number of variables (genes)

  # --- Generate data sets

data1<-matrix(rnorm(p*n1),p,n1)       # generate data1
data2<-matrix(rnorm(p*n2),p,n1)       # generate data2

  # --- Standardize variables to mean 0 and variance 1

 data1 <- t(apply(data1, 1, function(x) { (x - mean(x)) / sd(x) } ))
 data2 <- t(apply(data2, 1, function(x) { (x - mean(x)) / sd(x) } ))

  # --- Initialize variables

 imp1<-imp2<-matrix(0,p,p)   # matrix to store importance scores
 ntree=1000;                 # number of trees
 nsample<-c(n1,n2)           # vector containing sample size for each class

# --- run JRF for each target gene
for (j in 1:2){   # for loop over target genes

  #--- create matrix (classes by max(n1,n2)) of response variable
  y<-matrix(0,2,max(n1,n2));  
  y[1,seq(1,n1)]<-as.matrix(data1[j,])
  y[2,seq(1,n2)]<-as.matrix(data2[j,])

  x<-matrix(0,p*2-2,max(n1,n2)) #--- matrix of covariates 
  x[seq(1,p-1),seq(1,n1)]<-as.matrix(data1[-j,])
  x[seq(p,2*p-2),seq(1,n2)]<-as.matrix(data2[-j,])

   jrf.out<-JRF(x=x,y=y,mtry=round(sqrt(p-1)),importance=TRUE,
   sampsize=nsample,nclasses=nclasses,ntree=ntree)

   imp1[-j,j]<-importance(jrf.out,scale=FALSE)[seq(1,p-1)]      #- importance for net1
   imp2[-j,j]<-importance(jrf.out,scale=FALSE)[seq(p,(p-1)*2)]  #- importance for net2

}

Run the code above in your browser using DataLab