logistic_reg()
is a way to generate a specification of a model
before fitting and allows the model to be created using
different packages in R, Stan, keras, or via Spark. The main
arguments for the model are:
penalty
: The total amount of regularization
in the model. Note that this must be zero for some engines.
mixture
: The mixture amounts of different types of
regularization (see below). Note that this will be ignored for some engines.
These arguments are converted to their specific names at the
time that the model is fit. Other options and arguments can be
set using set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
logistic_reg(mode = "classification", penalty = NULL, mixture = NULL)
A single character string for the type of model. The only possible value for this model is "classification".
A non-negative number representing the total
amount of regularization (glmnet
, LiblineaR
, keras
, and spark
only).
For keras
models, this corresponds to purely L2 regularization
(aka weight decay) while the other models can be either or a combination
of L1 and L2 (depending on the value of mixture
).
A number between zero and one (inclusive) that is the
proportion of L1 regularization (i.e. lasso) in the model. When
mixture = 1
, it is a pure lasso model while mixture = 0
indicates that
ridge regression is being used. (glmnet
, LiblineaR
, and spark
only).
For LiblineaR
models, mixture
must be exactly 0 or 1 only.
Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below.
logistic_reg() %>% set_engine("glm") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Computational engine: glm ## ## Model fit template: ## stats::glm(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), ## family = stats::binomial)
logistic_reg(penalty = 0.1) %>% set_engine("glmnet") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Main Arguments: ## penalty = 0.1 ## ## Computational engine: glmnet ## ## Model fit template: ## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), ## family = "binomial")
The glmnet engine requires a single value for the penalty
argument (a
number or tune()
), but the full regularization path is always fit
regardless of the value given to penalty
. To pass in a custom sequence
of values for glmnet<U+2019>s lambda
, use the argument path_values
in
set_engine()
. This will assign the value of the glmnet lambda
parameter without disturbing the value given of logistic_reg(penalty)
.
For example:
logistic_reg(penalty = .1) %>% set_engine("glmnet", path_values = c(0, 10^seq(-10, 1, length.out = 20))) %>% translate()
## Logistic Regression Model Specification (classification) ## ## Main Arguments: ## penalty = 0.1 ## ## Computational engine: glmnet ## ## Model fit template: ## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), ## lambda = c(0, 10^seq(-10, 1, length.out = 20)), family = "binomial")
When fitting a pure ridge regression model (i.e., penalty = 0
), we
strongly suggest that you pass in a vector for path_values
that
includes zero. See issue #431 for a
discussion.
When using predict()
, the single penalty
value used for prediction
is the one specified in logistic_reg()
.
To predict on multiple penalties, use the multi_predict()
function.
This function returns a tibble with a list column called .pred
containing all of the penalty results.
logistic_reg() %>% set_engine("LiblineaR") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Computational engine: LiblineaR ## ## Model fit template: ## LiblineaR::LiblineaR(x = missing_arg(), y = missing_arg(), wi = missing_arg(), ## verbose = FALSE)
For LiblineaR
models, the value for mixture
can either be 0 (for
ridge) or 1 (for lasso) but not other intermediate values. In the
LiblineaR
documentation, these correspond to types 0 (L2-regularized)
and 6 (L1-regularized).
Be aware that the LiblineaR
engine regularizes the intercept. Other
regularized regression models do not, which will result in different
parameter estimates.
logistic_reg() %>% set_engine("stan") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Computational engine: stan ## ## Model fit template: ## rstanarm::stan_glm(formula = missing_arg(), data = missing_arg(), ## weights = missing_arg(), family = stats::binomial, refresh = 0)
Note that the refresh
default prevents logging of the estimation
process. Change this value in set_engine()
to show the logs.
For prediction, the stan
engine can compute posterior intervals
analogous to confidence and prediction intervals. In these instances,
the units are the original outcome and when std_error = TRUE
, the
standard deviation of the posterior distribution (or posterior
predictive distribution as appropriate) is returned.
logistic_reg() %>% set_engine("spark") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Computational engine: spark ## ## Model fit template: ## sparklyr::ml_logistic_regression(x = missing_arg(), formula = missing_arg(), ## weight_col = missing_arg(), family = "binomial")
logistic_reg() %>% set_engine("keras") %>% translate()
## Logistic Regression Model Specification (classification) ## ## Computational engine: keras ## ## Model fit template: ## parsnip::keras_mlp(x = missing_arg(), y = missing_arg(), hidden_units = 1, ## act = "linear")
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.
parsnip | glmnet | LiblineaR | spark | keras |
penalty | lambda | cost | reg_param (0) | penalty (0) |
mixture | alpha (1) | type (0) | elastic_net_param (0) | NA |
For logistic_reg()
, the mode will always be "classification".
The model can be created using the fit()
function using the
following engines:
R: "glm"
(the default), "glmnet"
, or "LiblineaR"
Stan: "stan"
Spark: "spark"
keras: "keras"
For this model, other packages may add additional engines. Use
show_engines()
to see the current set of engines.
# NOT RUN {
show_engines("logistic_reg")
logistic_reg()
# Parameters can be represented by a placeholder:
logistic_reg(penalty = varying())
# }
Run the code above in your browser using DataLab