Function that trains a tidymodels
model via workflows
based on the provided input parameters.
This function allows for cross validating the hyperparameters of the model.
regression.train_model(
x,
seed = 1,
verbose = NULL,
regression.model = parsnip::linear_reg(),
regression.tune = FALSE,
regression.tune_values = NULL,
regression.vfold_cv_para = NULL,
regression.recipe_func = NULL,
regression.response_var = "y_hat",
regression.surrogate_n_comb = NULL,
current_comb = NULL
)
A trained tidymodels
model based on the provided input parameters.
Data.table containing the training data.
Positive integer.
Specifies the seed before any randomness based code is being run.
If NULL
(default) no seed is set in the calling environment.
String vector or NULL.
Specifies the verbosity (printout detail level) through one or more of strings "basic"
, "progress"
,
"convergence"
, "shapley"
and "vS_details"
.
"basic"
(default) displays basic information about the computation which is being performed,
in addition to some messages about parameters being sets or checks being unavailable due to specific input.
"progress
displays information about where in the calculation process the function currently is.
#' "convergence"
displays information on how close to convergence the Shapley value estimates are
(only when iterative = TRUE
) .
"shapley"
displays intermediate Shapley value estimates and standard deviations (only when iterative = TRUE
)
and the final estimates.
"vS_details"
displays information about the v_S estimates.
This is most relevant for approach %in% c("regression_separate", "regression_surrogate", "vaeac"
).
NULL
means no printout.
Note that any combination of four strings can be used.
E.g. verbose = c("basic", "vS_details")
will display basic information + details about the v(S)-estimation process.
A tidymodels
object of class model_specs
. Default is a linear regression model, i.e.,
parsnip::linear_reg()
. See tidymodels for all possible models,
and see the vignette for how to add new/own models. Note, to make it easier to call explain()
from Python, the
regression.model
parameter can also be a string specifying the model which will be parsed and evaluated. For
example, "parsnip::rand_forest(mtry = hardhat::tune(), trees = 100, engine = "ranger", mode = "regression")"
is also a valid input. It is essential to include the package prefix if the package is not loaded.
Logical (default is FALSE
). If TRUE
, then we are to tune the hyperparemeters based on
the values provided in regression.tune_values
. Note that no checks are conducted as this is checked earlier in
setup_approach.regression_separate
and setup_approach.regression_surrogate
.
Either NULL
(default), a data.frame/data.table/tibble, or a function.
The data.frame must contain the possible hyperparameter value combinations to try.
The column names must match the names of the tunable parameters specified in regression.model
.
If regression.tune_values
is a function, then it should take one argument x
which is the training data
for the current coalition and returns a data.frame/data.table/tibble with the properties described above.
Using a function allows the hyperparameter values to change based on the size of the coalition See the regression
vignette for several examples.
Note, to make it easier to call explain()
from Python, the regression.tune_values
can also be a string
containing an R function. For example,
"function(x) return(dials::grid_regular(dials::mtry(c(1, ncol(x)))), levels = 3))"
is also a valid input.
It is essential to include the package prefix if the package is not loaded.
Either NULL
(default) or a named list containing
the parameters to be sent to rsample::vfold_cv()
. See the regression vignette for
several examples.
Either NULL
(default) or a function that that takes in a recipes::recipe()
object and returns a modified recipes::recipe()
with potentially additional recipe steps. See the regression
vignette for several examples.
Note, to make it easier to call explain()
from Python, the regression.recipe_func
can also be a string
containing an R function. For example,
"function(recipe) return(recipes::step_ns(recipe, recipes::all_numeric_predictors(), deg_free = 2))"
is also
a valid input. It is essential to include the package prefix if the package is not loaded.
String (default is y_hat
) containing the name of the response variable.
Integer (default is NULL
). The number of times each training observations
has been augmented. If NULL
, then we assume that we are doing separate regression.
Integer vector. The current combination of features, passed to verbosity printing function.
Lars Henry Berge Olsen