ml_lda(x, features = dplyr::tbl_vars(x), k = length(features), alpha = (50/k) + 1, beta = 0.1 + 1, ml.options = ml_options(), ...)
tbl_spark
).k
in fitting (as currently EM optimizer only supports symmetric distributions, so all values in the vector should be the same). For Expectation-Maximization optimizer values should be > 1.0.
By default alpha = (50 / k) + 1
, where 50/k
is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM.beta = 0.1 + 1
, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM.ml_options
for more details.data
argument can be used to
specify the data to be used when x
is a formula; this allows calls
of the form ml_linear_regression(y ~ x, data = tbl)
, and is
especially useful in conjunction with do
.Asuncion et al. (2009)
ml_als_factorization
,
ml_decision_tree
,
ml_generalized_linear_regression
,
ml_gradient_boosted_trees
,
ml_kmeans
,
ml_linear_regression
,
ml_logistic_regression
,
ml_multilayer_perceptron
,
ml_naive_bayes
,
ml_one_vs_rest
, ml_pca
,
ml_random_forest
,
ml_survival_regression