Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula()
. For more information and examples,
see 'Details' and the package vignettes.
The formula
method returns the model definition as a quoted expression.
Information on the estimated dynamite
model can be obtained via
print()
including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.
The summary()
method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit()
with
summary = TRUE
.
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O0"),
threads_per_chain = 1L,
grainsize = NULL,
custom_stan_model = NULL,
debug = NULL,
...
)# S3 method for dynamitefit
formula(x, ...)
# S3 method for dynamitefit
print(x, full_diagnostics = FALSE, ...)
# S3 method for dynamitefit
summary(object, ...)
dynamite
returns a dynamitefit
object which is a list containing
the following components:
stanfit
A stanfit
object, see rstan::sampling()
for details.
dformulas
A list of dynamiteformula
objects for internal use.
data
A processed version of the input data
.
data_name
Name of the input data object.
stan
A list
containing various elements related to Stan model
construction and sampling.
group_var
Name of the variable defining the groups.
time_var
Name of the variable defining the time index.
priors
Data frame containing the used priors.
backend
Either "rstan"
or "cmdstanr"
indicating which
package was used in sampling.
permutation
Randomized permutation of the posterior draws.
call
Original function call as an object of class call
.
formula
returns a quoted expression.
print
returns x
invisibly.
summary
returns a data.frame
.
[dynamiteformula
]
The model formula.
See dynamiteformula()
and 'Details'.
[data.frame
, tibble::tibble
, or data.table::data.table
]
The data that contains the variables in the model in long format.
Supported column types are integer
, logical
, double
, and
factor
. Columns of type character
will be converted to factors.
Unused factor levels will be dropped. The data
can contain missing
values which will simply be ignored in the estimation in a case-wise
fashion (per time-point and per channel). Input data
is converted to
channel specific matrix representations via stats::model.matrix.lm()
.
[character(1)
]
A column name of data
that denotes the
time index of observations. If this variable is a factor, the integer
representation of its levels are used internally for defining the time
indexing.
[character(1)
]
A column name of data
that denotes the
unique groups or NULL
corresponding to a scenario without any groups.
If group
is NULL
, a new column .group
is created with constant
value 1L
is created indicating that all observations belong to the same
group. In case of name conflicts with data
, see the group_var
element
of the return object to get the column name of the new variable.
[data.frame
]
An optional data frame with prior
definitions. See get_priors()
and 'Details'.
[character(1)
]
Defines the backend interface to Stan,
should be either "rstan"
(the default) or "cmdstanr"
. Note that
cmdstanr
needs to be installed separately as it is not on CRAN. It also
needs the actual CmdStan
software. See https://mc-stan.org/cmdstanr/
for details.
[logical(1)
]
All warnings and messages are suppressed
if set to FALSE
. Defaults to TRUE
. Setting this to FALSE
will also
disable checks for perfect collinearity in the model matrix.
[logical(1)
]
This is the verbose
argument for
rstan::sampling()
. Defaults to FALSE
.
[list()
]
This is the stanc_options
argument
passed to the compile method of a CmdStanModel
object via
cmdstan_model()
when backend = "cmdstanr"
. Defaults to list("O0")
.
To enable level one compiler optimizations, use list("O1")
.
See https://mc-stan.org/cmdstanr/reference/cmdstan_model.html
for details.
[integer(1)
]
A Positive integer defining the
number of parallel threads to use within each chain. Default is 1
. See
rstan::rstan_options()
and
https://mc-stan.org/cmdstanr/reference/model-method-sample.html
for details.
[integer(1)
]
A positive integer defining the
suggested size of the partial sums when using within-chain parallelization.
Default is number of time points divided by threads_per_chain
.
Setting this to 1
leads the workload division entirely to the internal
scheduler. The performance of the within-chain parallelization can be
sensitive to the choice of grainsize
, see Stan manual on reduce-sum for
details.
[character(1)
]
An optional character string
that either contains a customized Stan model code or a path to a .stan
file that contains the code. Using this will override the generated model
code. For expert users only.
[list()
]
A named list of form name = TRUE
indicating
additional objects in the environment of the dynamite
function which are
added to the return object. Additionally, values no_compile = TRUE
and
no_sampling = TRUE
can be used to skip the compilation of the Stan code
and sampling steps respectively. This can be useful for debugging when
combined with model_code = TRUE
, which adds the Stan model code to the
return object.
For dynamite()
, additional arguments to rstan::sampling()
or
the $sample()
method of the CmdStanModel
object
(see https://mc-stan.org/cmdstanr/reference/model-method-sample.html),
such as chains
and cores
(chains
and parallel_chains
in cmdstanr
). For summary()
,
additional arguments to as.data.frame.dynamitefit()
. For print()
,
further arguments to the print method for tibbles
(see tibble::formatting). Not used for formula()
.
[dynamitefit
]
The model fit object.
By default, the effective sample size (ESS) and Rhat
are computed only for the time- and group-invariant parameters
(full_diagnostics = FALSE
). Setting this to TRUE
computes ESS and Rhat
values for all model parameters, which can take some time for complex models.
[dynamitefit
]
The model fit object.
The best-case scalability of dynamite
in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
Santtu Tikka and Jouni Helske (2024). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, doi:10.48550/arXiv.2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. Advances in Life Course Research, 60, 100617. doi:10.1016/j.alcr.2024.100617.
Model fitting
dynamice()
,
get_priors()
,
update.dynamitefit()
Model formula construction
dynamiteformula()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
data.table::setDTthreads(1) # For CRAN
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
data.table::setDTthreads(1) # For CRAN
formula(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN
print(gaussian_example_fit)
data.table::setDTthreads(1) # For CRAN
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
Run the code above in your browser using DataLab