Bayesian inference for linear modeling with regularizing priors on the
model parameters that are driven by prior beliefs about \(R^2\), the
proportion of variance in the outcome attributable to the predictors. See
priors for an explanation of this critical point.
stan_glm with family="gaussian" also estimates a
linear model with normally-distributed errors and allows for various other
priors on the coefficients.
stan_aov(formula, data, projections = FALSE, contrasts = NULL, ...,
prior = R2(stop("'location' must be specified")), prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL)stan_lm(formula, data, subset, weights, na.action, model = TRUE, x = FALSE,
y = FALSE, singular.ok = TRUE, contrasts = NULL, offset, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
stan_lm.wfit(x, y, w, offset = NULL, singular.ok = TRUE, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
stan_lm.fit(x, y, offset = NULL, singular.ok = TRUE, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
Same as lm,
but we strongly advise against omitting the data
argument. Unless data is specified (and is a data frame) many
post-estimation functions (including update, loo,
kfold) are not guaranteed to work properly.
For stan_aov, a logical scalar (defaulting to
FALSE) indicating whether proj should be called
on the fit.
Further arguments passed to the function in the rstan
package (sampling, vb, or
optimizing), corresponding to the estimation method
named by algorithm. For example, if algorithm is
"sampling" it is possibly to specify iter, chains,
cores, refresh, etc.
Must be a call to R2 with its
location argument specified or NULL, which would
indicate a standard uniform prior for the \(R^2\).
A logical scalar (defaulting to FALSE) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome.
A string (possibly abbreviated) indicating the
estimation approach to use. Can be "sampling" for MCMC (the
default), "optimizing" for optimization, "meanfield" for
variational inference with independent normal distributions, or
"fullrank" for variational inference with a multivariate normal
distribution. See rstanarm-package for more details on the
estimation algorithms. NOTE: not all fitting functions support all four
algorithms.
Only relevant if algorithm="sampling". See
adapt_delta for details.
Same as lm, but
rarely specified.
Same as lm, but
rarely specified.
In stan_lm, stan_aov, logical scalars indicating whether to
return the design matrix and response vector. In stan_lm.fit or stan_lm.wfit,
a design matrix and response vector.
Either NULL (the default) or a call to
normal. If a normal prior is specified
without a scale, then the standard deviation is taken to be
the marginal standard deviation of the outcome divided by the square
root of the sample size, which is legitimate because the marginal
standard deviation of the outcome is a primitive parameter being
estimated.
Same as in lm.wfit but rarely specified.
A stanreg object is returned
for stan_lm, stan_aov.
A stanfit object (or a slightly modified
stanfit object) is returned if stan_lm.fit or stan_lm.wfit is called directly.
The stan_lm function is similar in syntax to the
lm function but rather than choosing the parameters to
minimize the sum of squared residuals, samples from the posterior
distribution are drawn using MCMC (if algorithm is
"sampling"). The stan_lm function has a formula-based
interface and would usually be called by users but the stan_lm.fit
and stan_lm.wfit functions might be called by other functions that
parse the data themselves and are analagous to lm.fit
and lm.wfit respectively.
In addition to estimating sigma --- the standard deviation of the
normally-distributed errors --- this model estimates a positive parameter
called log-fit_ratio. If it is positive, the marginal posterior
variance of the outcome will exceed the sample variance of the outcome
by a multiplicative factor equal to the square of fit_ratio.
Conversely if log-fit_ratio is negative, then the model underfits.
Given the regularizing nature of the priors, a slight underfit is good.
Finally, the posterior predictive distribution is generated with the predictors fixed at their sample means. This quantity is useful for checking convergence because it is reasonably normally distributed and a function of all the parameters in the model.
The stan_aov function is similar to aov and
has a somewhat customized print method but basically just
calls stan_lm with dummy variables to do a Bayesian analysis of
variance.
Lewandowski, D., Kurowicka D., and Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis. 100(9), 1989--2001.
The vignettes for stan_lm and stan_aov, which have more
thorough descriptions and examples.
Also see stan_glm, which --- if family =
gaussian(link="identity") --- also estimates a linear model with
normally-distributed errors but specifies different priors.
# NOT RUN {
op <- options(contrasts = c("contr.helmert", "contr.poly"))
stan_aov(yield ~ block + N*P*K, data = npk,
prior = R2(0.5), seed = 12345)
options(op)
# }
# NOT RUN {
(fit <- stan_lm(mpg ~ wt + qsec + am, data = mtcars, prior = R2(0.75),
# the next line is only to make the example go fast enough
chains = 1, iter = 500, seed = 12345))
plot(fit, prob = 0.8)
plot(fit, "hist", pars = c("wt", "am", "qsec", "sigma"),
transformations = list(sigma = "log"))
# }
Run the code above in your browser using DataLab