sgd: Stochastic gradient descent

Description

Run stochastic gradient descent in order to optimize the induced loss function given a model and data.

Usage

sgd(x, ...)
## S3 method for class 'formula':
sgd(formula, data, model, model.control = list(),
  sgd.control = list(...), ...)
## S3 method for class 'function':
sgd(x, ...)
## S3 method for class 'matrix':
sgd(x, y, model, model.control = list(),
  sgd.control = list(...), ...)
## S3 method for class 'big.matrix':
sgd(x, y, model, model.control = list(),
  sgd.control = list(...), ...)

Arguments

x,y

a design matrix and the respective vector of outcomes.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details can be found in "glm".

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environ

model

character specifying the model to be used: "lm" (linear model), "glm" (generalized linear model), "cox" (Cox proportional hazards model), "gmm" (generalized method of moments), "m" (M-estima

model.control

a list of parameters for controlling the model. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

sgd.control

an optional list of parameters for controlling the estimation. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

...

arguments to be used to form the default sgd.control arguments if it is not supplied directly.

Value

An object of class "sgd", which is a list containing the following components:
modelname of the model
coefficientsa named vector of coefficients
convergedlogical. Was the algorithm judged to have converged?
estimatesestimates from algorithm stored at each iteration specified in pos
posvector of indices specifying the iteration number each estimate was stored for
timesvector of times in seconds it took to complete the number of iterations specified in pos
model.outa list of model-specific output attributes

Details

Models: The Cox model assumes that the survival data is ordered when passed in, i.e., such that the risk set of an observation i is all data points after it.

Methods: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Learning rates and hyperparameters: [object Object],[object Object],[object Object],[object Object],[object Object]

References

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

Yurii Nesterov. A method for solving a convex programming problem with convergence rate $O(1/k^2)$. Soviet Mathematics Doklady, 27(2):372-376, 1983.

Boris T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1-17, 1964.

Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.

Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400-407, 1951.

Panos Toulis, Jason Rennie, and Edoardo M. Airoldi, "Statistical analysis of stochastic gradient methods for generalized linear models", In Proceedings of the 31st International Conference on Machine Learning, 2014.

Panos Toulis, Dustin Tran, and Edoardo M. Airoldi, "Stability and optimality in stochastic gradient descent", arXiv preprint arXiv:1505.02417, 2015.

Wei Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490, 2011.

Examples

Run this code

## Linear regression
set.seed(42)
N <- 1e4
d <- 10
X <- matrix(rnorm(N*d), ncol=d)
theta <- rep(5, d+1)
eps <- rnorm(N)
y <- cbind(1, X) %*% theta + eps
dat <- data.frame(y=y, x=X)
sgd.theta <- sgd(y ~ ., data=dat, model="lm")
sprintf("Mean squared error: %0.3f", mean((theta - as.numeric(sgd.theta$coefficients))^2))

## Wine quality (Cortez et al., 2009): Logistic regression
set.seed(42)
data("winequality")
dat <- winequality
dat$quality <- as.numeric(dat$quality > 5) # transform to binary
test.set <- sample(1:nrow(dat), size=nrow(dat)/8, replace=FALSE)
dat.test <- dat[test.set, ]
dat <- dat[-test.set, ]
sgd.theta <- sgd(quality ~ ., data=dat,
               model="glm", model.control=binomial(link="logit"),
               sgd.control=list(reltol=1e-5, npasses=200),
                 lr.control=c(scale=1, gamma=1, alpha=30, c=1))
sgd.theta

Run the code above in your browser using DataLab