Generalized linear modeling with spatial temporal aggregated predictors using prior distributions for the coefficients, intercept, spatial-temporal scales, and auxiliary parameters.
stap_glm(formula, family = gaussian(), subject_data = NULL,
distance_data = NULL, time_data = NULL, subject_ID = NULL,
max_distance = NULL, max_time = NULL, weights, offset = NULL,
model = TRUE, y = TRUE, contrasts = NULL, ..., prior = normal(),
prior_intercept = normal(), prior_stap = normal(),
prior_theta = log_normal(location = 1L, scale = 1L),
prior_aux = exponential(), adapt_delta = NULL)stap_lm(formula, family = gaussian(), subject_data = NULL,
distance_data = NULL, time_data = NULL, subject_ID = NULL,
max_distance = NULL, max_time = NULL, weights, offset = NULL,
model = TRUE, y = TRUE, contrasts = NULL, ..., prior = normal(),
prior_intercept = normal(), prior_stap = normal(),
prior_theta = log_normal(location = 1L, scale = 1L),
prior_aux = exponential(), adapt_delta = NULL)
Same as for glm
. Note that in-formula transformations will not be passed ot the final design matrix. Covariates that have "scale" in their name are not advised as this text is parsed for in the final model fit.
Same as glm
for gaussian, binomial, and poisson families.
a data.frame that contains data specific to the subject or subjects on whom the outcome is measured. Must contain one column that has the subject_ID on which to join the distance and time_data
a (minimum) three column data.frame that contains (1) an id_key (2) The sap/tap/stap features and (3) the distances between subject with a given id and the built environment feature in column (2), the distance column must be the only column of type "double" and the sap/tap/stap features must be specified in the dataframe exactly as they are in the formula.
same as distance_data except with time that the subject has been exposed to the built environment feature, instead of distance
name of column(s) to join on between subject_data and bef_data
the inclusion distance; upper bound for all elements of dists_crs
inclusion time; upper bound for all elements of times_crs
Same as glm
.
logical denoting whether or not to return the fixed covariates model frame object in the fitted object
In stap_glm
, logical scalar indicating whether to return the response vector. In stan_glm.fit
, a response vector.
Same as glm
, but
rarely specified.
Further arguments passed to the function in the rstap
to specify iter
, chains
, cores
, refresh
, etc.
The prior distribution for the regression coefficients.
prior
should be a call to one of the various functions provided by
rstap for specifying priors. The subset of these functions that
can be used for the prior on the coefficients can be grouped into several
"families":
Family | Functions |
Student t family | normal , student_t , cauchy |
Hierarchical shrinkage family | hs , hs_plus |
Laplace family | laplace , lasso |
Product normal family | product_normal |
See the priors help page for details on the families and
how to specify the arguments for all of the functions in the table above.
To omit a prior ---i.e., to use a flat (improper) uniform prior---
prior
can be set to NULL
, although this is rarely a good
idea.
Note: If prior
is from the Student t
family or Laplace family, and if the autoscale
argument to the
function used to specify the prior (e.g. normal
) is left at
its default and recommended value of TRUE
, then the default or
user-specified prior scale(s) may be adjusted internally based on the
scales of the predictors. See the priors help page and the
Prior Distributions vignette for details on the rescaling and the
prior_summary
function for a summary of the priors used for a
particular model.
The prior distribution for the intercept.
prior_intercept
can be a call to normal
, student_t
or
cauchy
. See the priors help page for details on
these functions. To omit a prior on the intercept ---i.e., to use a flat
(improper) uniform prior--- prior_intercept
can be set to
NULL
.
Note: The prior distribution for the intercept is set so it
applies to the value when all predictors are centered. If you prefer
to specify a prior on the intercept without the predictors being
auto-centered, then you have to omit the intercept from the
formula
and include a column of ones as a predictor,
in which case some element of prior
specifies the prior on it,
rather than prior_intercept
. Regardless of how
prior_intercept
is specified, the reported estimates of the
intercept always correspond to a parameterization without centered
predictors (i.e., same as in glm
).
prior for spatial-temporal aggregated predictors. Note that prior is set on the standardized latent covariates.
prior for the spatial-temporal aggregated predictors' scale. Can either be a single prior or a prior nested within a list of lists.
The prior distribution for the "auxiliary" parameter (if
applicable). The "auxiliary" parameter refers to a different parameter
depending on the family
. For Gaussian models prior_aux
controls "sigma"
, the error
standard deviation. For negative binomial models prior_aux
controls
"reciprocal_dispersion"
, which is similar to the
"size"
parameter of rnbinom
:
smaller values of "reciprocal_dispersion"
correspond to
greater dispersion. For gamma models prior_aux
sets the prior on
to the "shape"
parameter (see e.g.,
rgamma
), and for inverse-Gaussian models it is the
so-called "lambda"
parameter (which is essentially the reciprocal of
a scale parameter). Binomial and Poisson models do not have auxiliary
parameters.
prior_aux
can be a call to exponential
to
use an exponential distribution, or normal
, student_t
or
cauchy
, which results in a half-normal, half-t, or half-Cauchy
prior. See priors
for details on these functions. To omit a
prior ---i.e., to use a flat (improper) uniform prior--- set
prior_aux
to NULL
.
See the adapt_delta help page for details.
A stapreg object is returned
for stap_glm
.
A stapfit object (or a slightly modified
stapfit object) is returned if stan_glm.fit
is called directly.
The stap_glm
function is similar in syntax to
stan_glm
except instead of performing full bayesian
inference for a generalized linear model stap_glm incorporates spatial-temporal covariates
Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK.
Muth, C., Oravecz, Z., and Gabry, J. (2018) User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods for Psychology. 14(2), 99--119. https://www.tqmp.org/RegularArticles/vol14-2/p099/p099.pdf
stapreg-methods
and
glm
.
The various vignettes for stap_glm
at
https://biostatistics4socialimpact.github.io/rstap/articles and the preprint article.
# NOT RUN {
fit_glm <- stap_glm(formula = y ~ sex + sap(Fast_Food),
subject_data = homog_subject_data[1:100,], # for speed of example only
distance_data = homog_distance_data,
family = gaussian(link = 'identity'),
subject_ID = 'subj_id',
prior = normal(location = 0, scale = 5, autoscale = FALSE),
prior_intercept = normal(location = 25, scale = 5, autoscale = FALSE),
prior_stap = normal(location = 0, scale = 3, autoscale = FALSE),
prior_theta = log_normal(location = 1, scale = 1),
prior_aux = cauchy(location = 0,scale = 5),
max_distance = max(homog_distance_data$Distance),
chains = 1, iter = 300, # for speed of example only
refresh = -1, verbose = FALSE)
# }
Run the code above in your browser using DataLab