 This is the same model as with
This is the same model as with stan_lm but it utilizes the
output from biglm in the biglm package in order to
proceed when the data is too large to fit in memory.
stan_biglm(
  biglm,
  xbar,
  ybar,
  s_y,
  ...,
  prior = R2(stop("'location' must be specified")),
  prior_intercept = NULL,
  prior_PD = FALSE,
  algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = NULL
)stan_biglm.fit(
  b,
  R,
  SSR,
  N,
  xbar,
  ybar,
  s_y,
  has_intercept = TRUE,
  ...,
  prior = R2(stop("'location' must be specified")),
  prior_intercept = NULL,
  prior_PD = FALSE,
  algorithm = c("sampling", "meanfield", "fullrank", "optimizing"),
  adapt_delta = NULL,
  importance_resampling = TRUE,
  keep_every = 1
)
The list output by biglm in the biglm
package.
A numeric vector of column means in the implicit design matrix excluding the intercept for the observations included in the model.
A numeric scalar indicating the mean of the outcome for the observations included in the model.
A numeric scalar indicating the unbiased sample standard deviation of the outcome for the observations included in the model.
Further arguments passed to the function in the rstan 
package (sampling, 
vb, or 
optimizing), 
corresponding to the estimation method named by algorithm. For example, 
if algorithm is "sampling" it is possibly to specify iter, 
chains, cores, refresh, etc.
Must be a call to R2 with its location
argument specified or NULL, which would indicate a standard uniform
prior for the \(R^2\).
Either NULL (the default) or a call to
normal. If a normal prior is specified
without a scale, then the standard deviation is taken to be
the marginal standard deviation of the outcome divided by the square
root of the sample size, which is legitimate because the marginal
standard deviation of the outcome is a primitive parameter being
estimated.
Note: If using a dense representation of the design matrix
---i.e., if the sparse argument is left at its default value of
FALSE--- then the prior distribution for the intercept is set so it
applies to the value when all predictors are centered. If you prefer
to specify a prior on the intercept without the predictors being
auto-centered, then you have to omit the intercept from the
formula and include a column of ones as a predictor,
in which case some element of prior specifies the prior on it,
rather than prior_intercept. Regardless of how
prior_intercept is specified, the reported estimates of the
intercept always correspond to a parameterization without centered
predictors (i.e., same as in glm).
A logical scalar (defaulting to FALSE) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome.
A string (possibly abbreviated) indicating the 
estimation approach to use. Can be "sampling" for MCMC (the
default), "optimizing" for optimization, "meanfield" for
variational inference with independent normal distributions, or
"fullrank" for variational inference with a multivariate normal
distribution. See rstanarm-package for more details on the
estimation algorithms. NOTE: not all fitting functions support all four
algorithms.
Only relevant if algorithm="sampling". See 
the adapt_delta help page for details.
A numeric vector of OLS coefficients, excluding the intercept
A square upper-triangular matrix from the QR decomposition of the design matrix, excluding the intercept
A numeric scalar indicating the sum-of-squared residuals for OLS
A integer scalar indicating the number of included observations
A logical scalar indicating whether to add an intercept to the model when estimating it.
Logical scalar indicating whether to use 
importance resampling when approximating the posterior distribution with
a multivariate normal around the posterior mode, which only applies
when algorithm is "optimizing" but defaults to TRUE
in that case
Positive integer, which defaults to 1, but can be higher
in order to thin the importance sampling realizations and also only
apples when algorithm is "optimizing" but defaults to
TRUE in that case
The output of both stan_biglm and stan_biglm.fit is an
  object of stanfit-class rather than
  stanreg-objects, which is more limited and less convenient
  but necessitated by the fact that stan_biglm does not bring the full
  design matrix into memory. Without the full design matrix,some of the
  elements of a stanreg-objects object cannot be calculated,
  such as residuals. Thus, the functions in the rstanarm package that
  input stanreg-objects, such as 
  posterior_predict cannot be used.
The stan_biglm function is intended to be used in the same 
  circumstances as the biglm function in the biglm
  package but with an informative prior on the \(R^2\) of the regression. 
  Like biglm, the memory required to estimate the model 
  depends largely on the number of predictors rather than the number of 
  observations. However, stan_biglm and stan_biglm.fit have 
  additional required arguments that are not necessary in 
  biglm, namely xbar, ybar, and s_y.
  If any observations have any missing values on any of the predictors or the 
  outcome, such observations do not contribute to these statistics.
# NOT RUN {
# create inputs
ols <- lm(mpg ~ wt + qsec + am, data = mtcars, # all row are complete so ...
          na.action = na.exclude)              # not necessary in this case
b <- coef(ols)[-1]
R <- qr.R(ols$qr)[-1,-1]
SSR <- crossprod(ols$residuals)[1]
not_NA <- !is.na(fitted(ols))
N <- sum(not_NA)
xbar <- colMeans(mtcars[not_NA,c("wt", "qsec", "am")])
y <- mtcars$mpg[not_NA]
ybar <- mean(y)
s_y <- sd(y)
post <- stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, prior = R2(.75),
                       # the next line is only to make the example go fast
                       chains = 1, iter = 500, seed = 12345)
cbind(lm = b, stan_lm = rstan::get_posterior_mean(post)[13:15,]) # shrunk
# }
Run the code above in your browser using DataLab