Learn R Programming

JRF (version 0.1-1)

JRF: Joint Random Forest for the simultaneous estimation of multiple related networks (This file is a modified version of file RF.R contained in the R package randomForest.)

Description

Joint Random Forest for the simultaneous estimation of multiple related networks (This file is a modified version of file RF.R contained in the R package randomForest.)

Usage

JRF(x, y = NULL, xtest = NULL, ytest = NULL, ntree, sampsize,
  totsize = if (replace) ncol(x) else ceiling(0.632 * ncol(x)), mtry = if
  (!is.null(y) && !is.factor(y)) max(floor(nrow(x)/3), 1) else
  floor(sqrt(nrow(x))), replace = TRUE, classwt = NULL, cutoff, strata,
  nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, maxnodes = NULL,
  importance = FALSE, localImp = FALSE, nPerm = 1, proximity,
  oob.prox = proximity, norm.votes = TRUE, do.trace = FALSE,
  keep.forest = !is.null(y) && is.null(xtest), corr.bias = FALSE,
  keep.inbag = FALSE, nclasses, ...)

Arguments

x
numeric matrix ((C by r) by n): rows correspond to predictor variables for each class and columns correspond to the maximum number of samples over classes. Missing values are not allowed.
y
numeric matrix (C by n): rows correspond to response variables for each class and columns correspond to the maximum number of samples over classes. Missing values are not allowed.
xtest
a data frame or matrix (like x) containing predictors for the test set.
ytest
response for the test set.
ntree
numeric value: number of trees.
sampsize
numeric vector (C by 1): number of samples for each class of data.
totsize
Max number of samples across different classes
mtry
numeric value: number of predictors to be sampled at each node.
replace
Should sampling of cases be done with or without replacement?
classwt
Priors of the classes. Need not add up to one. Ignored for regression.
cutoff
(Classification only) A vector of length equal to number of classes. The winning class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes
strata
A (factor) variable that is used for stratified sampling.
nodesize
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Note that the default values are different for classification (1) and regression (5).
maxnodes
Maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (subject to limits by nodesize ). If set larger than maximum possible, a warning is issued.
importance
Should importance of predictors be assessed?
localImp
Should casewise importance measure be computed?
nPerm
Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effectiv. Currently only implemented for regression.
proximity
Should proximity measure among the rows be calculated?
oob.prox
Should proximity be calculated only on out-of-bag data?
norm.votes
If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs). Ignored for regression.
do.trace
If set to TRUE, give a more verbose output as randomForest is run. If set to some integer, then running output is printed for every do.trace trees.
keep.forest
If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
corr.bias
perform bias correction for regression? Note: Experimental. Use at your own risk.
keep.inbag
Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees (but not how many times, if sampling with replacement)
nclasses
numeric value: the total number of classes C.
...
optional parameters to be passed to the low level function

Value

  • out object of class JRF

Examples

Run this code
# --- Derive weighted networks via JRF

nclasses=2               # number of data sets / classes
n1<-n2<-50               # sample size for each data sets
p<-100                   # number of variables (genes)

  # --- Generate data sets

data1<-matrix(rnorm(p*n1),p,n1)       # generate data1
data2<-matrix(rnorm(p*n2),p,n1)       # generate data2

  # --- Standardize variables to mean 0 and variance 1

 data1 <- t(apply(data1, 1, function(x) { (x - mean(x)) / sd(x) } ))
 data2 <- t(apply(data2, 1, function(x) { (x - mean(x)) / sd(x) } ))

  # --- Initialize variables

 imp1<-imp2<-matrix(0,p,p)   # matrix to store importance scores
 ntree=1000;                 # number of trees
 nsample<-c(n1,n2)           # vector containing sample size for each class

# --- run JRF for each target gene
for (j in 1:2){   # for loop over target genes

  #--- create matrix (classes by max(n1,n2)) of response variable 
  y<-matrix(0,2,max(n1,n2));  
  y[1,seq(1,n1)]<-as.matrix(data1[j,])
  y[2,seq(1,n2)]<-as.matrix(data2[j,])

  x<-matrix(0,p*2-2,max(n1,n2)) #--- matrix of covariates 
  x[seq(1,p-1),seq(1,n1)]<-as.matrix(data1[-j,])
  x[seq(p,2*p-2),seq(1,n2)]<-as.matrix(data2[-j,])

   jrf.out<-JRF(x=x,y=y,mtry=round(sqrt(p-1)),importance=TRUE,
   sampsize=nsample,nclasses=nclasses,ntree=ntree)

   imp1[-j,j]<-importance(jrf.out,scale=FALSE)[seq(1,p-1)]      #- importance for net1
   imp2[-j,j]<-importance(jrf.out,scale=FALSE)[seq(p,(p-1)*2)]  #- importance for net2

}

Run the code above in your browser using DataLab