ml_random_forest
From sparklyr v0.2.29
by Javier Luraschi
Spark ML -- Random Forests
Perform regression or classification using random forests with a spark_tbl
.
Usage
ml_random_forest(x, response, features, max.bins = 32L, max.depth = 5L, num.trees = 20L, type = c("auto", "regression", "classification"), ...)
Arguments
- x
- An object coercable to a Spark DataFrame (typically, a
tbl_spark
). - response
- The name of the response vector (as a length-one character
vector), or a formula, giving a symbolic description of the model to be
fitted. When
response
is a formula, it is used in preference to other parameters to set theresponse
,features
, andintercept
parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g.response ~ feature1 + feature2 + ...
. The intercept term can be omitted by using- 1
in the model fit. - features
- The name of features (terms) to use for the model fit.
- max.bins
- The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
- max.depth
- Maximum depth of the tree (>= 0); that is, the maximum number of nodes separating any leaves from the root of the tree.
- num.trees
- Number of trees to train (>= 1).
- type
- The type of model to fit.
"regression"
treats the response as a continuous variable, while"classification"
treats the response as a categorical variable. When"auto"
is used, the model type is inferred based on the response variable type -- if it is a numeric type, then regression is used; classification otherwise. - ...
- Optional arguments; currently unused.
See Also
Other Spark ML routines: ml_decision_tree
,
ml_generalized_linear_regression
,
ml_gradient_boosted_trees
,
ml_kmeans
, ml_lda
,
ml_linear_regression
,
ml_logistic_regression
,
ml_multilayer_perceptron
,
ml_naive_bayes
,
ml_one_vs_rest
, ml_pca
,
ml_survival_regression
Community examples
Looks like there are no examples yet.