This functions is a fancy wrapper for packages MPTinR and
TreeBUGS applying various frequentist and Bayesian estimation methods
to the same data set with different levels of pooling/aggregation using a
single MPT model and collecting the results in one tibble
where each
row corresponds to one estimation method. Note that parameter restrictions
(e.g., equating different parameters or fixing them to a constant) need to
be part of the model (i.e., the .eqn
file) and cannot be passed as
an argument.
The settings for the various methods are specified via function
mpt_options
. The default settings use all available cores for
calculating the boostrap distribution as well as independent MCMC chains
and should be appropriate for most situations.
The data can have a single between-subjects condition (specified via
condition
). This condition can have more than two levels. If
specified, the pairwise differences between each level, the standard error
of the differences, and confidence-intervals of the differences are
calculated for each parameter. Please note that condition
is
silently converted to character
in the output. Thus, a specific
ordering of the factor
levels in the output cannot be guaranteed. If
the data has more than one between-subjects condition, these need to be
combined into one condition for this function.
Parameter differences or other support for within-subject conditions is not
provided. The best course of action for within-subjects conditions is to
simply include separate trees and separate sets of parameters for each
within-subjects condition. This allows to at least compare the estimates
for each within-subjects condition across levels of pooling and estimation
methods.
Pooling
The following pooling levels are provided (not all by all estimation approaches, see below).
Complete pooling: The traditional analysis approach in the MPT
literature in which data is aggregated across participants within each
between-subjects condition. This approach assumes that there are no
individual-dfferences. Produces one set of model parameters per condition.
No pooling: The model is fitted to the individual-level data in
an independent manner (i.e., no data aggregation). This approach
assumes that there is no similarity across participants and usually
requires considerable amounts of data on the individual-level. Produces
one set of model parameters per participant. Group-level estimates are
based on averaging the individual-level estimates.
Partial pooling: Data is fitted simultaneously to the
individual-level data assuming that the individual-level parameters come
from a group-level distribution. Individual-level parameters are often
treated as random-effects which are nested in the group-level parameters,
which is why this approach is also called hierarchical modeling. This
approach assumes both individual-level differences and similarities.
Produces one set of model parameters per participant plus one set of
group-level parameters. Thus, although partial pooling models usually
have more parameters than the no-pooling approaches, they are usually
less flexible as the hierarchical-structure provides regularization of
the individual-level parameters.
Implemented Estimation Methods
Maximum-likelihood estimation with MPTinR via
fit.mpt
:
"asymptotic_complete"
: Asymptotic ML theory, complete
pooling
"asymptotic_no"
: Asymptotic ML theory, no pooling
"pb_no"
: Parametric bootstrap, no pooling
"npb_no"
: Nonparametric bootstrap, no pooling
Bayesian estimation with TreeBUGS
"simple"
: Bayesian estimation, no pooling (C++,
simpleMPT)
"simple_pooling"
: Bayesian estimation, complete pooling
(C++, simpleMPT)
"trait"
: latent-trait model, partial pooling (JAGS,
traitMPT)
"trait_uncorrelated"
: latent-trait model without
correlation parameters, partial pooling (JAGS,
traitMPT)
"beta"
: beta-MPT model, partial pooling (JAGS,
betaMPT)
"betacpp"
: beta-MPT model, partial pooling (C++,
betaMPTcpp)
Frequentist/Maximum-Likelihood Methods
For the complete pooling asymptotic approach, the group-level parameter
estimates and goodness-of-fit statistics are the maximum-likelihood and
G-squared values returned by MPTinR
. The parameter differences are
based on these values, the standard errors of the difference is simply
the pooled standard error of the individual parameters. The overall fit
(column gof
) is based on an additional fit to the completely
aggregated data.
For the no pooling asymptotic approach, the individual-level
maximum-likelihood estimates are reported in column est_indiv
and
gof_indiv
and provide the basis for the other results. Whether or
not an individual-level parameter estimate is judged as identifiable
(column identifiable
) is based on separate fits with different
random starting values. If, in these separate, fits the same objective
criterion is reached several times (i.e., Log.Likelihood
within
.01 of best fit), but the parameter estimate differs (i.e., different
estimates within .01 of each other), then an estimate is flagged as
non-identifiable. If they are the same (i.e., within .01 of each other)
they are marked as identifiable. The group-level parameters are simply
the means of the identifiable individual-level parameters, the SE is the
SE of the mean for these parameter (i.e., SD/sqrt(N), where N excludes
non-identifiable parameters and thise estimated as NA), and the CI is
based on mean and SE. The group-level and overall fit is the sum of the
individual G-squares, sum of individual-level df, and corresponding
chi-square df. The difference between the conditions and corresponding
statistics are based on a t-test comparing the individual-level estimates
(again, after excluding non-identifiable estimates). The CIs of the
difference are based on the SEs (which are derived from a linear model
equivalent to the t-test).
The individual-level estimates of the bootstrap based no-pooling
approaches are identical to the asymptotic ones. However, the SE is the
SD of the bootstrapped distribution of parameter estimates, the CIs are
the corresponding quantiles of the bootstrapped distribution, and the
p-value is obtained from the bootstrapped G-square distribution.
Identifiability of individual-level parameter estimates is also based on
the bootstrap distribution of estimates. Specifically, we calculate the
range of the CI (i.e., maximum minus minimum CI value) and flag those
parameters as non-identifiable for which the range is larger than
mpt_options()$max_ci_indiv
, which defaults to 0.99
. Thus,
in the default settings we say a parameter is non-identifiable if the
bootstrap based CI extends from 0 to 1. The group-level estimates are the
mean of the identifiable individual-level estimates. And difference
between conditions is calculated in the same manner as for the asymptotic
case using the identifiable individual-level parameter esatimates.
Bayesian Methods
The simple approaches fit fixed-effects MPT models.
"simple"
uses no pooling and thus assumes independent uniform priors
for the individual-level parameters. Group-level means are
obtained as generated quantities by averaging the posterior samples
across participants. "simple_pooling"
aggregates observed
frequencies across participants and assumes a uniform prior for the
group-level parameters.
The latent-trait approaches transform the individual-level
parameters to a latent probit scale using the inverse cumulative standard
normal distribution. For these probit values, a multivariate normal
distribution is assumed at the group level. Whereas "trait"
estimates the corresponding correlation matrix of the parameters
(reported in the column est_rho
), "trait_uncorrelated"
does
not estimate this correlation matrix (i.e., parameters can still be
correlated across individuals, but this is not accounted for in the
model).
For all Bayesian methods, the posterior distribution of the parameters is
summarized by the posterior mean (in the column est
), posterior
standard deviation (se
), and credbility intervals (ci_*
).
For parameter differences (test_between
) and correlations
(est_rho
), Bayesian p-values are computed (column p
) by
counting the relative proportion of posterior samples that are smaller
than zero. Goodness of fit is tested with the T1 statistic
(observed vs. posterior-predicted average frequencies, focus =
"mean"
) and the T2 statistic (observed vs. posterior-predicted
covariance of frequencies, focus = "cov"
).