sgd: Stochastic gradient descent

Description

Run stochastic gradient descent on the underlying loss function for a given model and data, or a user-specified loss function.

Usage

sgd(x, ...)
## S3 method for class 'formula':
sgd(formula, data, model, model.control = list(),
  sgd.control = list(...), ...)
## S3 method for class 'function':
sgd(x, fn.control = list(), sgd.control = list(...), ...)
## S3 method for class 'matrix':
sgd(x, y, model, model.control = list(),
  sgd.control = list(...), ...)

Arguments

for sgd.function, x is a function to minimize; for sgd.matrix, x is a design matrix.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details can be found in "glm".

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environ

model

character specifying the model to be used: "lm" (linear model), "glm" (generalized linear model).

model.control

a list of parameters for controlling the model.

family ("glm"): a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or t

sgd.control

a list of parameters for controlling the estimation

method: character specifying the method to be used:"sgd","implicit","asgd". Default is"implicit". SeeDetails.
lr.type:

fn.control

for sgd.function, it is a list of controls for the function.

for {sgd.matrix}, y is a vector of observations, with length equal to the number of rows in x.

...

arguments to be used to form the default sgd.control arguments if it is not supplied directly.

Value

An object of class "sgd", which is a list containing at least the following components:
coefficients a named vector of coefficients
residuals the working residuals, that is the residuals in the final iteration of the fit. Since cases with zero weights are omitted, their working residuals are NA.
fitted.values the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
rank the numeric rank of the fitted linear model.
family the family object used.
linear.predictors the linear fit on link scale.
deviance up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
null.deviance The deviance for the null model, comparable with deviance. The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation.
iter the number of iterations of the algorithm used.
weights the weights initially supplied, a vector of 1s if none were.
df.residual the residual degrees of freedom.
df.null the residual degrees of freedom for the null model.
converged logical. Was the algorithm judged to have converged?

Details

Methods: "sgd" uses stochastic gradient descent (Robbins and Monro, 1951). "implicit" uses implicit stochastic gradient descent (Toulis et al., 2014). "asgd" uses stochastic gradient with averaging (Polyak and Juditsky, 1992).

Learning rates: "uni-dim" uses the one-dimensional learning rate. The method "p-dim" uses the p-dimensional learning rate. The method "adagrad" uses a diagonal scaling (Duchi et al., 2011).

References

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.

Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400-407, 1951.

Panos Toulis, Jason Rennie, and Edoardo M. Airoldi, "Statistical analysis of stochastic gradient methods for generalized linear models", In Proceedings of the 31st International Conference on Machine Learning, 2014.

Examples

Run this code

## Dobson (1990, p.93): Randomized Controlled Trial
counts <- c(18, 17, 15, 20, 10, 20, 25, 13, 12)
outcome <- gl(3, 1, 9)
treatment <- gl(3, 3)
print(d.AD <- data.frame(treatment, outcome, counts))
sgd.D93 <- sgd(counts ~ outcome + treatment, model="glm",
               model.control=list(family = poisson()))
sgd.D93

## Venables & Ripley (2002, p.189): an example with offsets
utils::data(anorexia, package="MASS")

anorex.1 <- sgd(Postwt ~ Prewt + Treat + offset(Prewt),
                data=anorexia, model="lm")

Run the code above in your browser using DataLab