Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula(). For more information and examples,
see 'Details' and the package vignettes.
The formula method returns the model definition as a quoted expression.
Information on the estimated dynamite model can be obtained via
print() including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.
The summary() method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit() with
summary = TRUE.
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O0"),
threads_per_chain = 1L,
grainsize = NULL,
custom_stan_model = NULL,
debug = NULL,
...
)# S3 method for dynamitefit
formula(x, ...)
# S3 method for dynamitefit
print(x, full_diagnostics = FALSE, ...)
# S3 method for dynamitefit
summary(object, ...)
dynamite returns a dynamitefit object which is a list containing
the following components:
stanfit
A stanfit object, see rstan::sampling() for details.
dformulas
A list of dynamiteformula objects for internal use.
data
A processed version of the input data.
data_name
Name of the input data object.
stan
A list containing various elements related to Stan model
construction and sampling.
group_var
Name of the variable defining the groups.
time_var
Name of the variable defining the time index.
priors
Data frame containing the used priors.
backend
Either "rstan" or "cmdstanr" indicating which
package was used in sampling.
permutation
Randomized permutation of the posterior draws.
call
Original function call as an object of class call.
formula returns a quoted expression.
print returns x invisibly.
summary returns a data.frame.
[dynamiteformula]
The model formula.
See dynamiteformula() and 'Details'.
[data.frame, tibble::tibble, or data.table::data.table]
The data that contains the variables in the model in long format.
Supported column types are integer, logical, double, and
factor. Columns of type character will be converted to factors.
Unused factor levels will be dropped. The data can contain missing
values which will simply be ignored in the estimation in a case-wise
fashion (per time-point and per channel). Input data is converted to
channel specific matrix representations via stats::model.matrix.lm().
[character(1)]
A column name of data that denotes the
time index of observations. If this variable is a factor, the integer
representation of its levels are used internally for defining the time
indexing.
[character(1)]
A column name of data that denotes the
unique groups or NULL corresponding to a scenario without any groups.
If group is NULL, a new column .group is created with constant
value 1L is created indicating that all observations belong to the same
group. In case of name conflicts with data, see the group_var element
of the return object to get the column name of the new variable.
[data.frame]
An optional data frame with prior
definitions. See get_priors() and 'Details'.
[character(1)]
Defines the backend interface to Stan,
should be either "rstan" (the default) or "cmdstanr". Note that
cmdstanr needs to be installed separately as it is not on CRAN. It also
needs the actual CmdStan software. See https://mc-stan.org/cmdstanr/
for details.
[logical(1)]
All warnings and messages are suppressed
if set to FALSE. Defaults to TRUE. Setting this to FALSE will also
disable checks for perfect collinearity in the model matrix.
[logical(1)]
This is the verbose argument for
rstan::sampling(). Defaults to FALSE.
[list()]
This is the stanc_options argument
passed to the compile method of a CmdStanModel object via
cmdstan_model() when backend = "cmdstanr". Defaults to list("O0").
To enable level one compiler optimizations, use list("O1").
See https://mc-stan.org/cmdstanr/reference/cmdstan_model.html
for details.
[integer(1)]
A Positive integer defining the
number of parallel threads to use within each chain. Default is 1. See
rstan::rstan_options() and
https://mc-stan.org/cmdstanr/reference/model-method-sample.html
for details.
[integer(1)]
A positive integer defining the
suggested size of the partial sums when using within-chain parallelization.
Default is number of time points divided by threads_per_chain.
Setting this to 1 leads the workload division entirely to the internal
scheduler. The performance of the within-chain parallelization can be
sensitive to the choice of grainsize, see Stan manual on reduce-sum for
details.
[character(1)]
An optional character string
that either contains a customized Stan model code or a path to a .stan
file that contains the code. Using this will override the generated model
code. For expert users only.
[list()]
A named list of form name = TRUE indicating
additional objects in the environment of the dynamite function which are
added to the return object. Additionally, values no_compile = TRUE and
no_sampling = TRUE can be used to skip the compilation of the Stan code
and sampling steps respectively. This can be useful for debugging when
combined with model_code = TRUE, which adds the Stan model code to the
return object.
For dynamite(), additional arguments to rstan::sampling() or
the $sample() method of the CmdStanModel object
(see https://mc-stan.org/cmdstanr/reference/model-method-sample.html),
such as chains and cores
(chains and parallel_chains in cmdstanr). For summary(),
additional arguments to as.data.frame.dynamitefit(). For print(),
further arguments to the print method for tibbles
(see tibble::formatting). Not used for formula().
[dynamitefit]
The model fit object.
By default, the effective sample size (ESS) and Rhat
are computed only for the time- and group-invariant parameters
(full_diagnostics = FALSE). Setting this to TRUE computes ESS and Rhat
values for all model parameters, which can take some time for complex models.
[dynamitefit]
The model fit object.
The best-case scalability of dynamite in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
Santtu Tikka and Jouni Helske (2024). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, doi:10.48550/arXiv.2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. Advances in Life Course Research, 60, 100617. doi:10.1016/j.alcr.2024.100617.
Model fitting
dynamice(),
get_priors(),
update.dynamitefit()
Model formula construction
dynamiteformula(),
lags(),
lfactor(),
random_spec(),
splines()
Model outputs
as.data.frame.dynamitefit(),
as.data.table.dynamitefit(),
as_draws_df.dynamitefit(),
coef.dynamitefit(),
confint.dynamitefit(),
get_code(),
get_data(),
get_parameter_dims(),
get_parameter_names(),
get_parameter_types(),
ndraws.dynamitefit(),
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
data.table::setDTthreads(1) # For CRAN
formula(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN
print(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
Run the code above in your browser using DataLab