Learn R Programming

randomForestSRC (version 3.4.1)

max.subtree.rfsrc: Acquire Maximal Subtree Information

Description

Extract maximal subtree information from a RF-SRC object. Used for variable selection and identifying interactions between variables.

Usage

# S3 method for rfsrc
max.subtree(object,
  max.order = 2, sub.order = FALSE, conservative = FALSE, ...)

Value

Invisibly returns a list with the following components:

order

Matrix of order depths for each variable up to max.order, averaged over trees. The matrix has p rows and max.order columns, where p is the number of variables. If max.order = 0, returns a matrix of dimension p x ntree containing first-order depths for each variable by tree.

count

Average number of maximal subtrees per variable, normalized by tree size.

nodes.at.depth

List of vectors recording the number of non-terminal nodes at each depth level for each tree.

sub.order

Matrix of average minimal depths of each variable relative to others (i.e., conditional minimal depth matrix). NULL if sub.order = FALSE.

threshold

Threshold value for selecting strong variables based on the mean of the minimal depth distribution.

threshold.1se

Conservative threshold equal to the mean minimal depth plus one standard error.

topvars

Character vector of selected variable names using the threshold criterion.

topvars.1se

Character vector of selected variable names using the threshold.1se criterion.

percentile

Percentile value of minimal depth for each variable.

density

Estimated density of the minimal depth distribution.

second.order.threshold

Threshold used for selecting strong second-order depth variables.

Arguments

object

An object of class (rfsrc, grow) or (rfsrc, forest).

max.order

Non-negative integer specifying the maximum interaction order for which minimal depth is calculated. Defaults to 2. Set max.order=0 to return first-order depths only. When max.order=0, conservative is automatically set to FALSE.

sub.order

Logical. If TRUE, returns the minimal depth of each variable conditional on every other variable. Useful for investigating variable interdependence. See Details.

conservative

Logical. If TRUE, uses a conservative threshold for selecting variables based on the marginal minimal depth distribution (Ishwaran et al., 2010). If FALSE, uses the tree-averaged distribution, which is less conservative and typically identifies more variables in high-dimensional settings.

...

Additional arguments passed to or from other methods.

Author

Hemant Ishwaran and Udaya B. Kogalur

Details

The maximal subtree for a variable x is the largest subtree in which the root node splits on x. The largest possible maximal subtree is the full tree (root node), though multiple maximal subtrees may exist for a variable. A variable may also have no maximal subtree if it is never used for splitting. See Ishwaran et al. (2010, 2011) for further discussion.

The minimal depth of a maximal subtree-called the first-order depth-quantifies the predictive strength of a variable. It is defined as the distance from the root node to the parent of the closest maximal subtree for x. Smaller values indicate stronger predictive impact. A variable is flagged as strong if its minimal depth is below the mean of the minimal depth distribution.

The second-order depth is the distance from the root to the second-closest maximal subtree of x. To request depths beyond first order, use the max.order option (e.g., max.order = 2 returns both first and second-order depths). Set max.order = 0 to retrieve first-order depths for each variable in each tree.

Set sub.order = TRUE to obtain the relative minimal depth of each variable j within the maximal subtree of another variable i. This returns a p x p matrix (with p the number of variables) whose entry (i,j) is the normalized relative depth of j in i's subtree. Entry (i,i) gives the depth of i relative to the root. Read the matrix across rows to assess inter-variable relationships: small (i,j) entries suggest interactions between variables i and j. See find.interaction for further details.

For competing risks, all analyses are unconditional (non-event specific).

References

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.

See Also

holdout.vimp.rfsrc, vimp.rfsrc

Examples

Run this code
# \donttest{
## ------------------------------------------------------------
## survival analysis
## first and second order depths for all variables
## ------------------------------------------------------------

data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time, status) ~ . , data = veteran)
v.max <- max.subtree(v.obj)

# first and second order depths
print(round(v.max$order, 3))

# the minimal depth is the first order depth
print(round(v.max$order[, 1], 3))

# strong variables have minimal depth less than or equal
# to the following threshold
print(v.max$threshold)

# this corresponds to the set of variables
print(v.max$topvars)

## ------------------------------------------------------------
## regression analysis
## try different levels of conservativeness
## ------------------------------------------------------------

mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
max.subtree(mtcars.obj)$topvars
max.subtree(mtcars.obj, conservative = TRUE)$topvars
# }

Run the code above in your browser using DataLab