priors
for an explanation of this critical point.
stan_glm
with family="gaussian"
also estimates a
linear model with normally-distributed errors and allows for various other
priors on the coefficients.stan_aov(formula, data = NULL, projections = FALSE, contrasts = NULL, ...,
prior = R2(stop("'location' must be specified")), prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL)stan_lm(formula, data, subset, weights, na.action, model = TRUE, x = FALSE,
y = FALSE, singular.ok = TRUE, contrasts = NULL, offset, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
stan_lm.wfit(x, y, w, offset = NULL, singular.ok = TRUE, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
stan_lm.fit(x, y, offset = NULL, singular.ok = TRUE, ...,
prior = R2(stop("'location' must be specified")), prior_intercept = NULL,
prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL)
lm
.stan_aov
, a logical scalar (defaulting to
FALSE
) indicating whether proj
should be called
on the fit.sampling
, vb
, or
optimizing
R2
with its
location
argument specified or NULL
, which would
indicate a standard uniform prior for the $R^2$.FALSE
) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome. Note that if TRUE
, the draws are
merely proportional to the actual distribution beca"sampling"
for MCMC (the
default), "optimizing"
for optimization, "meanfield"
for
variational inference with independent normalgorithm="sampling"
. See
adapt_delta
for details.lm
, but
rarely specified.lm
, but
rarely specified.stan_lm, stan_aov
, logical scalars indicating whether to
return the design matrix and response vector. In stan_lm.fit or stan_lm.wfit
,
a design matrix and response vector.lm.wfit
but rarely specified.stan_lm
function is similar in syntax to the
lm
function but rather than choosing the parameters to
minimize the sum of squared residuals, samples from the posterior
distribution are drawn using MCMC (if algorithm
is
"sampling"
). The stan_lm
function has a formula-based
interface and would usually be called by users but the stan_lm.fit
and stan_lm.wfit
functions might be called by other functions that
parse the data themselves and are analagous to lm.fit
and lm.wfit
respectively.
In addition to estimating sigma
--- the standard deviation of the
normally-distributed errors --- this model estimates a positive parameter
called log-fit_ratio
. If it is positive, the marginal posterior
variance of the outcome will exceed the sample variance of the outcome
by a multiplicative factor equal to the square of fit_ratio
.
Conversely if log-fit_ratio
is negative, then the model underfits.
Given the regularizing nature of the priors, a slight underfit is good.
Finally, the posterior predictive distribution is generated with the
predictors fixed at their sample means. This quantity is useful for
checking convergence because it is reasonably normally distributed
and a function of all the parameters in the model.
The stan_aov
function is similar to aov
and
has a somewhat customized print
method but basically just
calls stan_lm
with dummy variables to do a Bayesian analysis of
variance.stan_lm
and stan_aov
, which have more
thorough descriptions and examples.Also see stan_glm
, which --- if family =
gaussian(link="identity")
--- also estimates a linear model with
normally-distributed errors but specifies different priors.
stan_aov(yield ~ block + N*P*K, data = npk, contrasts = "contr.poly",
prior = R2(0.5), seed = 12345)
(fit <- stan_lm(mpg ~ wt + qsec + am, data = mtcars, prior = R2(0.75),
# the next line is only to make the example go fast enough
chains = 1, iter = 1000, seed = 12345))
plot(fit)
Run the code above in your browser using DataLab