Prepare a Spark DataFrame for Spark ML Routines

This routine prepares a Spark DataFrame for use by Spark ML routines.

ml_prepare_dataframe(x, features, response = NULL, ..., ml.options = ml_options(), envir = new.env(parent = emptyenv()))
An object coercable to a Spark DataFrame (typically, a tbl_spark).
The name of features (terms) to use for the model fit.
The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When response is a formula, it is used in preference to other parameters to set the response, features, and intercept parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. response ~ feature1 + feature2 + .... The intercept term can be omitted by using - 1 in the model fit.
Optional arguments; currently unused.
Optional arguments, used to affect the model generated. See ml_options for more details.
An R environment -- when supplied, it will be filled with metadata describing the transformations that have taken place.

Spark DataFrames are prepared through the following transformations:

  1. All specified columns are transformed into a numeric data type (using a simple cast for integer / logical columns, and ft_string_indexer for strings),
  2. The ft_vector_assembler is used to combine the specified features into a single 'feature' vector, suitable for use with Spark ML routines.

After calling this function, the envir environment (when supplied) will be populated with a set of variables:

The name of the generated features vector.
The name of the generated response vector.

  • ml_prepare_dataframe
## Not run: 
# # example of how 'ml_prepare_dataframe' might be used to invoke
# # Spark's LinearRegression routine from the 'ml' package
# envir <- new.env(parent = emptyenv())
# tdf <- ml_prepare_dataframe(df, features, response, envir = envir)
# lr <- invoke_new(
#   sc,
#   ""
# )
# # use generated 'features', 'response' vector names in model fit
# model <- lr %>%
#   invoke("setFeaturesCol", envir$features) %>%
#   invoke("setLabelCol", envir$response)
# ## End(Not run)
Documentation reproduced from package sparklyr, version 0.3.7, License: file LICENSE

Community examples

Looks like there are no examples yet.