- formula
Object of class formula
or character
describing the model to fit. Interaction terms supported only for numerical variables.
- data
Training data of class data.frame
, matrix
, dgCMatrix
(Matrix) or gwaa.data
(GenABEL).
- num.trees
Number of trees. Default is 500.
- mtry
Artefact from 'ranger'. NOT needed for diversity forests.
- importance
Variable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression and the sum of test statistics (see splitrule
) for survival. NOTE: Currently, only "permutation" (and "none") work for diversity forests.
- write.forest
Save divfor.forest
object, required for prediction. Set to FALSE
to reduce memory usage if no prediction intended.
- probability
Grow a probability forest as in Malley et al. (2012). NOTE: Not yet implemented for diversity forests!
- min.node.size
Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 5 for probability.
- max.depth
Maximal tree depth. A value of NULL or 0 (the default) corresponds to unlimited depth, 1 to tree stumps (1 split per tree).
- replace
Sample with replacement.
- sample.fraction
Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement. For classification, this can be a vector of class-specific values.
- case.weights
Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.
- class.weights
Weights for the outcome classes (in order of the factor levels) in the splitting rule (cost sensitive learning). Classification and probability prediction only. For classification the weights are also applied in the majority vote in terminal nodes.
- splitrule
Splitting rule. For classification and probability estimation "gini" or "extratrees" with default "gini". For regression "variance", "extratrees" or "maxstat" with default "variance". For survival "logrank", "extratrees", "C" or "maxstat" with default "logrank". NOTE: For diversity forests currently only the default splitting rules are supported.
- num.random.splits
Artefact from 'ranger'. NOT needed for diversity forests.
- alpha
For "maxstat" splitrule: Significance threshold to allow splitting. NOT needed for diversity forests.
- minprop
For "maxstat" splitrule: Lower quantile of covariate distribution to be considered for splitting. NOT needed for diversity forests.
- split.select.weights
Numeric vector with weights between 0 and 1, representing the probability to select variables for splitting. Alternatively, a list of size num.trees, containing split select weight vectors for each tree can be used.
- always.split.variables
Currently not useable. Character vector with variable names to be always selected.
- respect.unordered.factors
Handling of unordered factor covariates. One of 'ignore' and 'order' (the option 'partition' possible in 'ranger' is not (yet) possible with diversity forests). Default is 'ignore'. Alternatively TRUE (='order') or FALSE (='ignore') can be used.
- scale.permutation.importance
Scale permutation importance by standard error as in (Breiman 2001). Only applicable if permutation variable importance mode selected.
- keep.inbag
Save how often observations are in-bag in each tree.
- inbag
Manually set observations per tree. List of size num.trees, containing inbag counts for each observation. Can be used for stratified sampling.
- holdout
Hold-out mode. Hold-out all samples with case weight 0 and use these for variable importance and prediction error.
- quantreg
Prepare quantile prediction as in quantile regression forests (Meinshausen 2006). Regression only. Set keep.inbag = TRUE
to prepare out-of-bag quantile prediction.
- oob.error
Compute OOB prediction error. Set to FALSE
to save computation time, e.g. for large survival forests.
- num.threads
Number of threads. Default is number of CPUs available.
- save.memory
Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems. NOT needed for diversity forests.
- verbose
Show computation status and estimated runtime.
- seed
Random seed. Default is NULL
, which generates the seed from R
. Set to 0
to ignore the R
seed.
- dependent.variable.name
Name of outcome variable, needed if no formula given. For survival forests this is the time variable.
- status.variable.name
Name of status variable, only applicable to survival data and needed if no formula given. Use 1 for event and 0 for censoring.
- classification
Only needed if data is a matrix. Set to TRUE
to grow a classification forest.
- nsplits
Number of candidate splits to sample for each split. Default is 30.
- proptry
Parameter that restricts the number of candidate splits considered for small nodes. If nsplits
is larger than proptry
times the number of all possible splits, the number of candidate splits to draw is reduced to the largest integer smaller than proptry
times the number of all possible splits. Default is 1, which corresponds to always using nsplits
candidate splits.