Learn R Programming

randomForestSRC (version 3.4.2)

tune.rfsrc: Tune Random Forest for optimal mtry and nodesize

Description

Finds the optimal mtry and nodesize for a random forest using out-of-bag (OOB) error. Two search strategies are supported: a grid-based search and a golden-section search with noise control. Works for all response families supported by rfsrc.fast.

Usage

# S3 method for rfsrc
tune(formula, data,
  mtry.start = ncol(data) / 2,
  nodesize.try = c(1:9, seq(10, 100, by = 5)), ntree.try = 100,
  sampsize = function(x) { min(x * .632, max(150, x^(3/4))) },
  nsplit = 1, step.factor = 1.25, improve = 1e-3, strikeout = 3, max.iter = 25,
  method = c("grid", "golden"),
  final.window = 5, reps.initial = 2, reps.final = 3,
  trace = FALSE, do.best = TRUE, seed = NULL, ...)

# S3 method for rfsrc tune.nodesize(formula, data, nodesize.try = c(1:9, seq(10, 150, by = 5)), ntree.try = 100, sampsize = function(x) { min(x * .632, max(150, x^(4/5))) }, nsplit = 1, method = c("grid", "golden"), final.window = 5, reps.initial = 2, reps.final = 3, max.iter = 50, trace = TRUE, seed = NULL, ...)

Value

For tune:

  • results: matrix with columns nodesize, mtry, err.

  • optimal: named numeric vector c(nodesize = ..., mtry = ...).

  • rf: fitted forest at the optimum if do.best = TRUE.

For tune.nodesize:

  • nsize.opt: optimal nodesize.

  • err: data frame with columns nodesize and err.

Arguments

formula

A model formula.

data

A data frame with response and predictors.

mtry.start

Initial mtry for tune.

nodesize.try

Candidate nodesize values. Only values \(\le\) floor(sampsize(n)/2) are used.

ntree.try

Number of trees grown at each tuning evaluation.

sampsize

Function or numeric giving the per-tree subsample size. During tuning a single numeric size ssize is computed and passed to rfsrc.fast. If a vector is supplied (e.g., class specific), its total is used for ssize.

nsplit

Number of random split points to consider at each node.

step.factor

Multiplicative step-out factor over mtry for grid search in tune.

improve

Minimum relative improvement required to continue a search step in tune.

strikeout

Maximum number of consecutive non-improving steps allowed in tune.

max.iter

Maximum number of iterations for the step-out search in tune or the coordinate loop when method = "golden".

method

Search strategy: "grid" (default) or "golden".

final.window

For golden search, the terminal bracket width for the one-dimensional line search.

reps.initial

Replicates averaged at interior evaluations during golden iterations.

reps.final

Replicates averaged for each candidate during the final local sweep in golden search.

trace

If TRUE, prints progress.

do.best

If TRUE, tune fits and returns a forest at the optimal pair.

seed

Optional integer for reproducible tuning. The holdout split (when used) and all tuning fits become deterministic for a given seed.

...

Additional arguments passed to rfsrc.fast. Arguments that control tuning itself (perf.type, forest, save.memory, ntree, mtry, nodesize, sampsize, nsplit) are managed internally.

Author

Hemant Ishwaran and Udaya B. Kogalur

Details

Error estimate. If 2 * ssize < n, a disjoint holdout of size ssize is used for evaluation; otherwise OOB error is used.

Subsample used during tuning. Both functions derive a single integer ssize from sampsize and pass it to rfsrc.fast for all tuning fits. This improves stability and comparability across candidates. When do.best = TRUE in tune, the final forest is fit with the user-supplied sampsize exactly as provided.

Grid search. tune performs a step-out search over mtry for each nodesize in nodesize.try, using step.factor, improve, strikeout, and max.iter. tune.nodesize evaluates the supplied nodesize.try grid directly.

Golden search. Uses a guarded golden-section line search with noise control. For each one-dimensional search (over nodesize or mtry), the routine probes a small left-anchor grid 1:9, iterates golden shrinkage until the bracket width is at most final.window, then runs a short local sweep with reps.final replicates. In tune the searches over nodesize and mtry alternate in a simple coordinate loop, with improve and strikeout as stopping controls.

See Also

rfsrc.fast

Examples

Run this code
# \donttest{
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)

## Fixed seed makes tuning reproducible
set.seed(1)

## Full tuner over nodesize and mtry (grid)
o1 <- tune(quality ~ ., wine, sampsize = 100, method = "grid")
print(o1$optimal)

## Golden search alternative
o2 <- tune(quality ~ ., wine, sampsize = 100, method = "golden",
           reps.initial = 2, reps.final = 3, seed = 1)
print(o2$optimal)

## visualize the nodesize/mtry surface
if (library("interp", logical.return = TRUE)) {

  plot.tune <- function(o, linear = TRUE) {
    x <- o$results[, 1]
    y <- o$results[, 2]
    z <- o$results[, 3]
    so <- interp(x = x, y = y, z = z, linear = linear)
    idx <- which.min(z)
    x0 <- x[idx]; y0 <- y[idx]
    filled.contour(x = so$x, y = so$y, z = so$z,
                   xlim = range(so$x, finite = TRUE) + c(-2, 2),
                   ylim = range(so$y, finite = TRUE) + c(-2, 2),
                   color.palette = colorRampPalette(c("yellow", "red")),
                   xlab = "nodesize", ylab = "mtry",
                   main = "error rate for nodesize and mtry",
                   key.title = title(main = "OOB error", cex.main = 1),
                   plot.axes = {
                     axis(1); axis(2)
                     points(x0, y0, pch = "x", cex = 1, font = 2)
                     points(x, y, pch = 16, cex = .25)
                   })
  }

  plot.tune(o1)
  plot.tune(o2)
}

## ------------------------------------------------------------
## nodesize only: grid vs golden
## ------------------------------------------------------------
o3 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "grid",
                    trace = TRUE, seed = 1)
o4 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "golden",
                    reps.initial = 2, reps.final = 3, trace = TRUE, seed = 1)
plot(o3$err, type = "s", xlab = "nodesize", ylab = "error")

## ------------------------------------------------------------
## Tuning for class imbalance (rfq with geometric mean performance)
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o5 <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean",
           method = "golden", seed = 1)
print(o5$optimal)

## ------------------------------------------------------------
## Competing risks example (nodesize only)
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err, type = "s")
# }

Run the code above in your browser using DataLab