ddtlcm_fit: MH-within-Gibbs sampler to sample from the full posterior distribution of DDT-LCM

Description

Use DDT-LCM to estimate latent class and tree on class profiles for multivariate binary outcomes.

Usage

ddtlcm_fit(
  K,
  data,
  item_membership_list,
  total_iters = 5000,
  initials = list(),
  priors = list(),
  controls = list(),
  initialize_args = list(method_lcm = "random", method_dist = "euclidean", method_hclust
    = "ward.D", method_add_root = "min_cor", alpha = 0, theta = 0)
)

Value

an object of class "ddt_lcm"; a list containing the following elements:

tree_samples: a list of information of the tree collected from the sampling algorithm, including: accept: a binary vector where 1 indicates acceptance of the proposal tree and 0 indicates rejection. tree_list: a list of posterior samples of the tree. dist_mat_list: a list of tree-structured covariance matrices representing the marginal covariances among the leaf parameters, integrating out the internal node parameters and all intermediate stochastic paths in the DDT branching process.
response_probs_samples: a total_iters x K x J array of posterior samples of item response probabilities
class_probs_samples: a K x total_iters matrix of posterior samples of class probabilities
Z_samples: a N x total_iters integer matrix of posterior samples of individual class assignments
Sigma_by_group_samples: a G x total_iters matrix of posterior samples of diffusion variances
c_samples: a total_iters vector of posterior samples of divergence function hyperparameter
loglikelihood: a total_iters vector of log-likelihoods of the full model
loglikelihood_lcm: a total_iters vector of log-likelihoods of the LCM model only
setting: a list of model setup information, including: K, item_membership_list, and G
controls: a list of model controls, including: fix_tree: FALSE to perform MH sampling of the tree, TRUE to fix the tree at the initial input. c_order: a numeric value of 1 or 2 (see Arguments))
data: the input data matrix

Arguments

K

number of classes (integer)

data

an NxJ matrix of multivariate binary responses, where N is the number of individuals, and J is the number of granular items

item_membership_list

a list of G elements, where the g-th element contains the column indices of data corresponding to items in major group g, and G is number of major item groups

total_iters

number of posterior samples to collect (integer)

initials

a named list of initial values of the following parameters:

tree_phylo4d: a phylo4d object. The initial tree have K leaves (labeled as "v1" through "vK"), 1 singleton root node (labeled as "u1"), and K-1 internal nodes (labeled as "u1" through \(u_{K-1}\)). The tree also contains parameters for the leaf nodes and the root node (which equals 0). The parameters for the internal nodes can be NAs because they will not be used in the algorithm.

response_prob

a K by J matrix with entries between 0 and 1. The initial values for the item response probabilities. They should equal to the expit-transformed leaf parameters of tree_phylo4d.

class_probability

a K-vector with entries between 0 and 1. The initial values for the class probabilities. Entries should be nonzero and sum up to 1, or otherwise will be normalized

class_assignments

a N-vector with integer entries from 1, ..., K. The initial values for individual class assignments.

Sigma_by_group

a G-vector greater than 0. The initial values for the group-specific diffusion variances.

c

a value greater than 0. The initial values for the group-specific diffusion variances.

Parameters not supplied with initial values will be initialized using the initialize function with arguments in initialize_args.

priors

a named list of values of hyperparameters of priors. See the function initialize for explanation.

shape_sigma: a G-vector of positive values. The g-th element is the shape parameter for the inverse-Gamma prior on diffusion variance parameter sigma_g^2. Default is rep(2, G).

rate_sigma

a G-vector of positive values. Rate parameter. See above. Default is rep(2, G).

prior_dirichlet

a K-vector with entries positive entries. The parameter of the Dirichlet prior on class probability.

shape_c

a positive value. The shape parameter for the Gamma prior on divergence function hyperparameter c. Default is 1.

rate_c

a positive value. The rate parameter for c. Default is 1.

a_pg

a positive value. The scale parameter for the generalized logistic distribution used in the augmented Gibbs sampler for leaf parameters. Default is 1, corresponding to the standard logistic distribution.

controls

a named list of control variables.

fix_tree: a logical. If TRUE (default), the tree structure will be sampled in the algorithm. If FALSE, the tree structure will be fixed at the initial input.

c_order

a numeric value. If 1, the divergence function is \(a(t) = c/(1-t)\). If 2, the divergence function is \(a(t) = c/(1-t)^2\).

initialize_args

a named list of initialization arguments. See the function initialize for explanation.

Examples

Run this code

# load the MAP tree structure obtained from the real HCHS/SOL data
data(data_synthetic)
# extract elements into the global environment
list2env(setNames(data_synthetic, names(data_synthetic)), envir = globalenv()) 
# run DDT-LCM
result <- ddtlcm_fit(K = 3, data = response_matrix, item_membership_list, total_iters = 50)

Run the code above in your browser using DataLab