sgd(x, ...)## S3 method for class 'formula':
sgd(formula, data, model, model.control = list(),
sgd.control = list(...), ...)
## S3 method for class 'function':
sgd(x, fn.control = list(), sgd.control = list(...), ...)
## S3 method for class 'matrix':
sgd(x, y, model, model.control = list(),
sgd.control = list(...), ...)
sgd.function
, x is a function to minimize; for
sgd.matrix
, x is a design matrix.as.data.frame
to a data frame) containing the
variables in the model. If not found in data, the variables are taken from
environ"lm"
(linear
model), "glm"
(generalized linear model)."glm"
): a description of the error distribution and
link function to be used in the model. This can be a character string
naming a family function, a family function or t"sgd"
,"implicit"
,"asgd"
. Default is"implicit"
. Seesgd.function
, it is a list of controls for the
function.sgd.control
arguments if it is not supplied directly."sgd"
, which is a list containing at least the
following components:coefficients
a named vector of coefficients
residuals
the working residuals, that is the residuals in the final iteration of
the fit. Since cases with zero weights are omitted, their working residuals
are NA.
fitted.values
the fitted mean values, obtained by transforming the linear predictors by the
inverse of the link function.
rank
the numeric rank of the fitted linear model.
family
the family
object used.
linear.predictors
the linear fit on link scale.
deviance
up to a constant, minus twice the maximized log-likelihood. Where sensible,
the constant is chosen so that a saturated model has deviance zero.
null.deviance
The deviance for the null model, comparable with deviance
. The null
model will include the offset, and an intercept if there is one in the model.
Note that this will be incorrect if the link function depends on the data
other than through the fitted mean: specify a zero offset to force a correct
calculation.
iter
the number of iterations of the algorithm used.
weights
the weights initially supplied, a vector of 1s if none were.
df.residual
the residual degrees of freedom.
df.null
the residual degrees of freedom for the null model.
converged
logical. Was the algorithm judged to have converged?
Learning rates: "uni-dim" uses the one-dimensional learning rate. The method "p-dim" uses the p-dimensional learning rate. The method "adagrad" uses a diagonal scaling (Duchi et al., 2011).
Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.
Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400-407, 1951.
Panos Toulis, Jason Rennie, and Edoardo M. Airoldi, "Statistical analysis of stochastic gradient methods for generalized linear models", In Proceedings of the 31st International Conference on Machine Learning, 2014.
## Dobson (1990, p.93): Randomized Controlled Trial
counts <- c(18, 17, 15, 20, 10, 20, 25, 13, 12)
outcome <- gl(3, 1, 9)
treatment <- gl(3, 3)
print(d.AD <- data.frame(treatment, outcome, counts))
sgd.D93 <- sgd(counts ~ outcome + treatment, model="glm",
model.control=list(family = poisson()))
sgd.D93
## Venables & Ripley (2002, p.189): an example with offsets
utils::data(anorexia, package="MASS")
anorex.1 <- sgd(Postwt ~ Prewt + Treat + offset(Prewt),
data=anorexia, model="lm")
Run the code above in your browser using DataLab