higrad
is used to implement hierarchical incremental gradient descent (HiGrad), an algorithm that conducts statistical inference for online learning.
higrad(x, y, model = "lm", nsteps = nrow(x), nsplits = 2, nthreads = 2,
step.ratio = 1, n0 = NA, skip = 0, eta = 1/2, alpha = 1/2,
burnin = round(nsteps/10), start = rnorm(ncol(x), 0, 0.01),
replace = FALSE, track = FALSE)
input matrix of features. Each row is an observation vector, and each column is a feature.
response variable. Quantitative for model = "lm"
. For model = "logistic"
it should be a factor with two levels.
type of model to fit. Currently only linear regression ("lm"
) and logistic regression ("logistic"
) are supported.
total number of steps. This is equivalent to the number of queries made to get a noisy evaluation of the gradient.
number of splits in the HiGrad tree.
numbers of threads each previous thread is split into. Either a number (equal split size throughout) or a vector.
ratio of the lengths of the threads from the two adjacent levels (the latter one divided by the previous). Either a number (equal ratio throughout) or a vector.
length of the 0th-level thread.
number of steps to skip when estimating the coefficients by averaging.
constant in front of the step size. See Details for the formula of the step size.
exponent of the step size. See Details for the formula of the step size.
number of steps as the burn-in period. The burn-in period is not accounted for in the total budget nsteps
.
starting values of the coefficients.
logical; whether or not to sample the data with replacement.
logical; whether or not to store the entire path for plotting.
An object with S3 class higrad
.
estimate of the coefficients.
matrix of estimates of the coefficients along each HiGrad threads.
model type.
covariance structure \(\Sigma\) of the estimates.
entire path of the estimates along each thread. Can be used for diagnostic and check for convergence.
HiGrad is designed to conduct statistical inference for online learning, without incurring additional computational cost compared with the vanilla stochastic gradient descent (SGD).
The HiGrad procedure begins by performing SGD iterations for a while and then split the single thread into a few, and this procedure hierarchically operates in this fashion along each thread.
With predictions provided by multiple threads in place, a t-based confidence interval is constructed by de-correlating predictions using covariance structures given by the Ruppert<U+2013>Polyak averaging scheme.
In order to implement HiGrad, a configuration of the tree structure needs to be specified. The default setting is a binary tree with 2 splits.
The step size is set to be eta*t^(-alpha)
.
Weijie Su and Yuancheng Zhu. (2018) Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent. https://arxiv.org/abs/1802.04876.
See print.higrad
, plot.higrad
, predict.higrad
for other methods for the higrad
class.
# NOT RUN {
# fitting linear regression on a simulated dataset
n <- 1e3
d <- 10
sigma <- 0.1
theta <- rep(1, d)
x <- matrix(rnorm(n * d), n, d)
y <- as.numeric(x %*% theta + rnorm(n, 0, sigma))
fit <- higrad(x, y, model = "lm")
print(fit)
# predict for 10 new samples
newx <- matrix(rnorm(10 * d), 10, d)
pred <- predict(fit, newx)
pred
# }
Run the code above in your browser using DataLab