Train a linear model using mini-batch stochastic gradient descent.
cuda_ml_sgd(x, ...)# S3 method for default
cuda_ml_sgd(x, ...)
# S3 method for data.frame
cuda_ml_sgd(
x,
y,
fit_intercept = TRUE,
loss = c("squared_loss", "log", "hinge"),
penalty = c("none", "l1", "l2", "elasticnet"),
alpha = 1e-04,
l1_ratio = 0.5,
epochs = 1000L,
tol = 0.001,
shuffle = TRUE,
learning_rate = c("constant", "invscaling", "adaptive"),
eta0 = 0.001,
power_t = 0.5,
batch_size = 32L,
n_iters_no_change = 5L,
...
)
# S3 method for matrix
cuda_ml_sgd(
x,
y,
fit_intercept = TRUE,
loss = c("squared_loss", "log", "hinge"),
penalty = c("none", "l1", "l2", "elasticnet"),
alpha = 1e-04,
l1_ratio = 0.5,
epochs = 1000L,
tol = 0.001,
shuffle = TRUE,
learning_rate = c("constant", "invscaling", "adaptive"),
eta0 = 0.001,
power_t = 0.5,
batch_size = 32L,
n_iters_no_change = 5L,
...
)
# S3 method for formula
cuda_ml_sgd(
formula,
data,
fit_intercept = TRUE,
loss = c("squared_loss", "log", "hinge"),
penalty = c("none", "l1", "l2", "elasticnet"),
alpha = 1e-04,
l1_ratio = 0.5,
epochs = 1000L,
tol = 0.001,
shuffle = TRUE,
learning_rate = c("constant", "invscaling", "adaptive"),
eta0 = 0.001,
power_t = 0.5,
batch_size = 32L,
n_iters_no_change = 5L,
...
)
# S3 method for recipe
cuda_ml_sgd(
x,
data,
fit_intercept = TRUE,
loss = c("squared_loss", "log", "hinge"),
penalty = c("none", "l1", "l2", "elasticnet"),
alpha = 1e-04,
l1_ratio = 0.5,
epochs = 1000L,
tol = 0.001,
shuffle = TRUE,
learning_rate = c("constant", "invscaling", "adaptive"),
eta0 = 0.001,
power_t = 0.5,
batch_size = 32L,
n_iters_no_change = 5L,
...
)
Depending on the context:
* A __data frame__ of predictors. * A __matrix__ of predictors. * A __recipe__ specifying a set of preprocessing steps * created from [recipes::recipe()]. * A __formula__ specifying the predictors and the outcome.
Optional arguments; currently unused.
A numeric vector (for regression) or factor (for classification) of desired responses.
If TRUE, then the model tries to correct for the global mean of the response variable. If FALSE, then the model expects data to be centered. Default: TRUE.
Loss function, must be one of "squared_loss", "log", "hinge".
Type of regularization to perform, must be one of "none", "l1", "l2", "elasticnet".
- "none": no regularization. - "l1": perform regularization based on the L1-norm (LASSO) which tries to minimize the sum of the absolute values of the coefficients. - "l2": perform regularization based on the L2 norm (Ridge) which tries to minimize the sum of the square of the coefficients. - "elasticnet": perform the Elastic Net regularization which is based on the weighted averable of L1 and L2 norms. Default: "none".
Multiplier of the penalty term. Default: 1e-4.
The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1.
For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1
penalty.
For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
The penalty term is computed using the following formula:
penalty = alpha
* l1_ratio
* ||w||_1 +
0.5 * alpha
* (1 - l1_ratio
) * ||w||^2_2
where ||w||_1 is the L1 norm of the coefficients, and ||w||_2 is the L2
norm of the coefficients.
The number of times the model should iterate through the entire dataset during training. Default: 1000L.
Threshold for stopping training. Training will stop if
(loss in current epoch) > (loss in previous epoch) - tol
.
Default: 1e-3.
Whether to shuffles the training data after each epoch. Default: True.
Must be one of "constant", "invscaling", "adaptive".
- "constant": the learning rate will be kept constant.
- "invscaling": (learning rate) = (initial learning rate) / pow(t, power_t)
where t
is the number of epochs and
power_t
is a tunable parameter of this model.
- "adaptive": (learning rate) = (initial learning rate) as long as the
training loss keeps decreasing. Each time the last
n_iter_no_change
consecutive epochs fail to decrease
the training loss by tol
, the current learning rate is
divided by 5.
Default: "constant".
The initial learning rate. Default: 1e-3.
The exponent used in the invscaling learning rate calculations.
The number of samples that will be included in each batch. Default: 32L.
The maximum number of epochs to train if there is no imporvement in the model. Default: 5.
A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side.
When a __recipe__ or __formula__ is used, data
is
specified as a __data frame__ containing the predictors and (if
applicable) the outcome.
A linear model that can be used with the 'predict' S3 generic to make predictions on new data points.
# NOT RUN {
library(cuda.ml)
model <- cuda_ml_sgd(
mpg ~ ., mtcars,
batch_size = 4L, epochs = 50000L,
learning_rate = "adaptive", eta0 = 1e-5,
penalty = "l2", alpha = 1e-5, tol = 1e-6,
n_iters_no_change = 10L
)
preds <- predict(model, mtcars[names(mtcars) != "mpg"])
print(all.equal(preds$.pred, mtcars$mpg, tolerance = 0.09))
# }
Run the code above in your browser using DataLab