stan_lm
but it utilizes
the output from biglm
in the biglm package
in order to proceed when the data is too large to fit in memory.
stan_biglm(biglm, xbar, ybar, s_y, has_intercept = TRUE, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL)
stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, has_intercept = TRUE, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL)
sampling
, vb
, or
optimizing
), corresponding to the estimation method
named by algorithm
. For example, if algorithm
is
"sampling"
it is possibly to specify iter
, chains
,
cores
, refresh
, etc.R2
with its
location
argument specified or NULL
, which would
indicate a standard uniform prior for the $R^2$.NULL
(the default) or a call to
normal
. If a normal
prior is specified
without a scale
, then the standard deviation is taken to be
the marginal standard deviation of the outcome divided by the square
root of the sample size, which is legitimate because the marginal
standard deviation of the outcome is a primitive parameter being
estimated.FALSE
) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome."sampling"
for MCMC (the
default), "optimizing"
for optimization, "meanfield"
for
variational inference with independent normal distributions, or
"fullrank"
for variational inference with a multivariate normal
distribution. See rstanarm-package
for more details on the
estimation algorithms. NOTE: not all fitting functions support all four
algorithms.algorithm="sampling"
. See
adapt_delta
for details.stan_biglm
and stan_biglm.fit
is an object of
stanfit-class
rather than stanreg-objects
,
which is more limited and less convenient but necessitated by the fact that
stan_biglm
does not bring the full design matrix into memory. Without the
full design matrix,some of the elements of a stanreg-objects
object
cannot be calculated, such as residuals. Thus, the functions in the rstanarm
package that input stanreg-objects
, such as
posterior_predict
cannot be used.
stan_biglm
function is intended to be used in the same
circumstances as the biglm
function in the biglm
package but with an informative prior on the $R^2$ of the regression.
Like biglm
, the memory required to estimate the model
depends largely on the number of predictors rather than the number of
observations. However, the original call to biglm
must
be a little unconventional. The original formula
must
not include an intercept and all the columns of the implicit design matrix
must be expressed as deviations from the sample mean. If the design matrix
is on the hard disk, the column sums must be accumulated, divided by the
sample size to produce the column means, and then the column means must be
swept from the design matrix on disk. If any observations have any missing
values on any of the predictors or the outcome, such observations do not
contribute to the column means, which must be passed as the xbar
argument. If the outcome is also expressed as the deviation from its
sample mean, then the coefficients produced by biglm
are the same as if the raw data were used and an intercept were included.
The sample mean and sample standard deviation of the outcome must also
be passed.
# create inputs
ols <- lm(mpg ~ wt + qsec + am - 1, # next line is critical for centering
data = as.data.frame(scale(mtcars, scale = FALSE)))
b <- coef(ols)
R <- qr.R(ols$qr)
SSR <- crossprod(ols$residuals)[1]
N <- length(ols$fitted.values)
xbar <- colMeans(mtcars[,c("wt", "qsec", "am")])
y <- mtcars$mpg
ybar <- mean(y)
s_y <- sd(y)
post <- stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, prior = R2(.75),
# the next line is only to make the example go fast
chains = 1, iter = 1000, seed = 12345)
cbind(lm = b, stan_lm = rstan::get_posterior_mean(post)[14:16]) # shrunk
Run the code above in your browser using DataCamp Workspace