ranger(formula = NULL, data = NULL, num.trees = 500, mtry = NULL,
importance = "none", write.forest = FALSE, probability = FALSE,
min.node.size = NULL, replace = TRUE, sample.fraction = ifelse(replace,
1, 0.632), splitrule = NULL, case.weights = NULL,
split.select.weights = NULL, always.split.variables = NULL,
respect.unordered.factors = FALSE, scale.permutation.importance = FALSE,
keep.inbag = FALSE, num.threads = NULL, save.memory = FALSE,
verbose = TRUE, seed = NULL, dependent.variable.name = NULL,
status.variable.name = NULL, classification = NULL)formula or character describing the model to fit.data.frame, matrix or gwaa.data (GenABEL).ranger.forest object, needed for prediction.FALSE, all factors are regarded ordered.NULL, which generates the seed from R.TRUE to grow a classification forest.ranger with elementsforestsplit.varIDs object do not necessarily represent the column number in R.predictionsforestsplit.varIDs object do not necessarily represent the column number in R.predictionsvariable.importanceprediction.errorr.squaredconfusion.matrixunique.death.timeschfsurvivalcallnum.treesnum.independent.variablesmtrymin.node.sizetreetypeimportance.modenum.samplesinbag.countsWith the probability option and factor dependent variable a probability forest is grown.
Here, the estimated response variances are used for splitting, as in regression forests.
Predictions are class probabilities for each sample.
For details see Malley et al. (2012).
Note that for classification and regression nodes with size smaller than min.node.size can occur, like in original Random Forest.
For survival all nodes contain at least min.node.size samples.
Variables selected with always.split.variables are tried additionaly to the mtry variables randomly selected.
In split.select.weights variables weighted with 0 are never selected and variables with 1 are always selected.
Weights do not need to sum up to 1, they will be normalized later.
The usage of split.select.weights can increase the computation times for large forests.
For a large number of variables and data frame as input data the formula interface can be slow or impossible to use.
Alternatively dependent.variable.name (and status.variable.name for survival) can be used.
Consider setting save.memory = TRUE if you encounter memory problems for very large datasets.
For GWAS data consider combining ranger with the GenABEL package.
See the Examples section below for a demonstration using Plink data.
All SNPs in the GenABEL object will be used for splitting.
To use only the SNPs without sex or other covariates from the phenotype file, use 0 on the right hand side of the formula.
Note that missing values are treated as an extra category while splitting.
See
Notes:
Breiman, L. (2001). Random forests. Mach Learn, 45(1), 5-32. Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. Ann Appl Stat, 841-860. Malley, J. D., Kruppa, J., Dasgupta, A., Malley, K. G., & Ziegler, A. (2012). Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med, 51(1), 74.
predict.rangerrequire(ranger)
## Classification forest with default settings
ranger(Species ~ ., data = iris)
## Prediction
train.idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris.train <- iris[train.idx, ]
iris.test <- iris[-train.idx, ]
rg.iris <- ranger(Species ~ ., data = iris.train, write.forest = TRUE)
pred.iris <- predict(rg.iris, dat = iris.test)
table(iris.test$Species, pred.iris$predictions)
## Variable importance
rg.iris <- ranger(Species ~ ., data = iris, importance = "impurity")
rg.iris$variable.importance
## Survival forest
require(survival)
rg.veteran <- ranger(Surv(time, status) ~ ., data = veteran)
plot(rg.veteran$unique.death.times, rg.veteran$survival[1,])
## Alternative interface
ranger(dependent.variable.name = "Species", data = iris)
## Use GenABEL interface to read Plink data into R and grow a classification forest
## The ped and map files are not included
library(GenABEL)
convert.snp.ped("data.ped", "data.map", "data.raw")
dat.gwaa <- load.gwaa.data("data.pheno", "data.raw")
phdata(dat.gwaa)$trait <- factor(phdata(dat.gwaa)$trait)
ranger(trait ~ ., data = dat.gwaa)Run the code above in your browser using DataLab