extraTrees (version 1.0.5)

extraTrees: Function for training ExtraTree classifier or regression.

Description

This function executes ExtraTree building method (implemented in Java).

Usage

"extraTrees"(x, y, ntree=500, mtry = if (!is.null(y) && !is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, numRandomCuts = 1, evenCuts = FALSE, numThreads = 1, quantile = F, weights = NULL, subsetSizes = NULL, subsetGroups = NULL, tasks = NULL, probOfTaskCuts = mtry / ncol(x), numRandomTaskCuts = 1, na.action = "stop", ...)

Arguments

x
a numberic input data matrix, each row is an input.
y
a vector of output values: if vector of numbers then regression, if vector of factors then classification.
ntree
the number of trees (default 500).
mtry
the number of features tried at each node (default is ncol(x)/3 for regression and sqrt(ncol(x)) for classification).
nodesize
the size of leaves of the tree (default is 5 for regression and 1 for classification)
numRandomCuts
the number of random cuts for each (randomly chosen) feature (default 1, which corresponds to the official ExtraTrees method). The higher the number of cuts the higher the chance of a good cut.
evenCuts
if FALSE then cutting thresholds are uniformly sampled (default). If TRUE then the range is split into even intervals (the number of intervals is numRandomCuts) and a cut is uniformly sampled from each interval.
numThreads
the number of CPU threads to use (default is 1).
quantile
if TRUE then quantile regression is performed (default is FALSE), only for regression data. Then use predict(et, newdata, quantile=k) to make predictions for k quantile.
weights
a vector of sample weights, one positive real value for each sample. NULL means standard learning, i.e. equal weights.
subsetSizes
subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from a random subset of size subsetSizes. NULL means no subsetting, i.e. all samples are used.
subsetGroups
list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g] samples.
tasks
vector of tasks, integers from 1 and up. NULL if no multi-task learning
probOfTaskCuts
probability of performing task cut at a node (default mtry / ncol(x)). Used only if tasks is specified.
numRandomTaskCuts
number of times task cut is performed at a node (default 1). Used only if tasks is specified.
na.action
specifies how to handle NA in x: "stop" (default) will give error is any NA present, "zero" will set all NA to zero and "fuse" will build trees by skipping samples when the chosen feature is NA for them.
...
not used currently.

Value

The trained model from input x and output values y, stored in ExtraTree object.

Details

For classification ExtraTrees at each node chooses the cut based on minimizing the Gini impurity index and for regression the variance.

For more details see the package vignette, i.e. vignette("extraTrees"). If Java runs out of memory: java.lang.OutOfMemoryError: Java heap space, then (assuming you have free memory) you can increase the heap size by: options( java.parameters = "-Xmx2g" ) before calling library( "extraTrees" ), where 2g defines 2GB of heap size. Change it as necessary.

See Also

predict.extraTrees for predicting and prepareForSave for saving ExtraTrees models to disk.

Examples

Run this code
  ## Regression with ExtraTrees:
  n <- 1000  ## number of samples
  p <- 5     ## number of dimensions
  x <- matrix(runif(n*p), n, p)
  y <- (x[,1]>0.5) + 0.8*(x[,2]>0.6) + 0.5*(x[,3]>0.4) +
       0.1*runif(nrow(x))
  et <- extraTrees(x, y, nodesize=3, mtry=p, numRandomCuts=2)
  yhat <- predict(et, x)
  
  #######################################
  ## Multi-task regression with ExtraTrees:
  n <- 1000  ## number of samples
  p <- 5     ## number of dimensions
  x <- matrix(runif(n*p), n, p)
  task <- sample(1:10, size=n, replace=TRUE)
  ## y depends on the task: 
  y <- 0.5*(x[,1]>0.5) + 0.6*(x[,2]>0.6) + 0.8*(x[cbind(1:n,(task %% 2) + 3)]>0.4)
  et <- extraTrees(x, y, nodesize=3, mtry=p-1, numRandomCuts=2, tasks=task)
  yhat <- predict(et, x, newtasks=task)
  
  #######################################
  ## Classification with ExtraTrees (with test data)
  make.data <- function(n) {
    p <- 4
    f <- function(x) (x[,1]>0.5) + (x[,2]>0.6) + (x[,3]>0.4)
    x <- matrix(runif(n*p), n, p)
    y <- as.factor(f(x))
    return(list(x=x, y=y))
  }
  train <- make.data(800)
  test  <- make.data(500)
  et    <- extraTrees(train$x, train$y)
  yhat  <- predict(et, test$x)
  ## accuracy
  mean(test$y == yhat)
  ## class probabilities
  yprob = predict(et, test$x, probability=TRUE)
  head(yprob)
  
  #######################################
  ## Quantile regression with ExtraTrees (with test data)
  make.qdata <- function(n) {
    p <- 4
    f <- function(x) (x[,1]>0.5) + 0.8*(x[,2]>0.6) + 0.5*(x[,3]>0.4)
    x <- matrix(runif(n*p), n, p)
    y <- as.numeric(f(x))
    return(list(x=x, y=y))
  }
  train <- make.qdata(400)
  test  <- make.qdata(200)
  
  ## learning extra trees:
  et <- extraTrees(train$x, train$y, quantile=TRUE)
  ## estimate median (0.5 quantile)
  yhat0.5 <- predict(et, test$x, quantile = 0.5)
  ## estimate 0.8 quantile (80%)
  yhat0.8 <- predict(et, test$x, quantile = 0.8)


  #######################################
  ## Weighted regression with ExtraTrees 
  make.wdata <- function(n) {
    p <- 4
    f <- function(x) (x[,1]>0.5) + 0.8*(x[,2]>0.6) + 0.5*(x[,3]>0.4)
    x <- matrix(runif(n*p), n, p)
    y <- as.numeric(f(x))
    return(list(x=x, y=y))
  }
  train <- make.wdata(400)
  test  <- make.wdata(200)
  
  ## first half of the samples have weight 1, rest 0.3
  weights <- rep(c(1, 0.3), each = nrow(train$x) / 2)
  et <- extraTrees(train$x, train$y, weights = weights, numRandomCuts = 2)
  ## estimates of the weighted model
  yhat <- predict(et, test$x)

Run the code above in your browser using DataLab