Learn R Programming

ddtlcm: Dirichlet diffusion tree-latent class model (DDT-LCM)

An R package for Tree-regularized latent class mModels with a DDT process prior on class profiles

Maintainer: Mengbing Li (mengbing@umich.edu)

Contributors: Briana Stephenson (bstephenson@hsph.harvard.edu); Zhenke Wu (zhenkewu@umich.edu)

CitationPaper Link
Bayesian tree-regularized LCMLi M, Stephenson B, Wu Z (2023). Tree-Regularized Bayesian Latent Class Analysis for Improving Weakly Separated Dietary Pattern Subtyping in Small-Sized Subpopulations. ArXiv:2306.04700.Link

Table of content

Installation

# install bioconductor package `ggtree` for visualizing results:
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ggtree")

install.packages("devtools",repos="https://cloud.r-project.org")
devtools::install_github("limengbinggz/ddtlcm")

Overview

ddtlcm is designed for analyzing multivariate binary observations over grouped items in a tree-regularized Bayesian LCM framework. Between-class similarities are guided by an unknown tree, where classes positioned closer on the tree are more similar a priori. This framework facilitates the sharing of information between classes to make better estimates of parameters using less data. The model is built upon equipping LCMs with a DDT process prior on the class profiles, with varying degrees of shrinkage across major item groups. The model is particularly promising for addressing weak separation of latent classes when sample sizes are small. The posterior inferential algorithm is based on a hybrid Metropolis-Hastings-within-Gibbs algorithm and can provide posterior uncertainty quantifications.

ddtlcm works for

  • multivariate binary responses over pre-specified grouping of items

  • The functions' relations in the package ddtlcm can be visualized by

library(DependenciesGraphs) # if not installed, try this-- devtools::install_github("datastorm-open/DependenciesGraphs")
library(QualtricsTools) # devtools::install_github("emmamorgan-tufts/QualtricsTools")
dep <- funDependencies('package:ddtlcm','ddtlcm_fit')
plot(dep)

Examples

  • A simple workflow using semi-synthetic data is provided.

  • ddtlcm estimates the tree over classes and class profiles simultaneously

A Quickstart

library(ddtlcm)

data(parameter_diet)
# unlist the elements into variables in the global environment
list2env(setNames(parameter_diet, names(parameter_diet)), envir = globalenv()) 

N <- 496
seed_parameter = 1 # random seed to generate node parameters given the tree
seed_response = 1 # random seed to generate multivariate binary observations from LCM

# simulate data given the parameters
sim_data <- simulate_lcm_given_tree(tree_phylo, N, 
    class_probability, item_membership_list, Sigma_by_group, 
    root_node_location = 0, seed_parameter = seed_parameter,
    seed_response = seed_response)

K <- 6 # number of latent classes, same as number of leaves on the tree
result_diet <- ddtlcm_fit(K = K, data = sim_data$response_matrix, 
  item_membership_list = item_membership_list, total_iters = 100)
print(result_diet)

Contributing And Getting Help

Please report bugs by opening an issue. If you wish to contribute, please make a pull request. If you have questions, you can open a discussion thread.

Note

  • When running some functions in the package, such as ddtlcm_fit, a warning that "Tree contains singleton nodes" may be displayed. This warning originates from the checkPhylo4 function in the phylobase package to perform basic checks on the validity of S4 phylogenetic objects. We would like to point out that seeing such warnings shall not pose any concerns about the statistical validity of the implemented algorithm. This is because any tree generaetd from a DDT process contains a singleton node (having only one child node) as the root node. To avoid repeated appearances of this warning, we recommend either of the followings:

    • Wrapping around the code via suppressWarnings({ code_that_will_generate_singleton_warning });

    • Setting options(warn = -1) globally. This may be dangerous because other meaningful warnings may be ignored.

Copy Link

Version

Install

install.packages('ddtlcm')

Monthly Downloads

185

Version

0.2.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mengbing Li

Last Published

April 4th, 2024

Functions in ddtlcm (0.2.1)

initialize_hclust

Estimate an initial binary tree on latent classes using hclust()
div_time

Sample divergence time on an edge uv previously traversed by m(v) data points
predict.ddt_lcm

Prediction of class memberships from posterior predictive distributions
logllk_div_time_two

Compute loglikelihood of divergence times for a(t) = c/(1-t)^2
initialize_randomLCM

Provide a random initial response profile based on latent class mode
parameter_diet

Parameters for the HCHS dietary recall data example
log_expit

Numerically accurately compute f(x) = log(x / (1/x)).
logllk_lcm

Calculate loglikelihood of the latent class model, conditional on tree structure
plot.ddt_lcm

Create trace plots of DDT-LCM parameters
plot_tree_with_heatmap

Plot the MAP tree and class profiles (heatmap) of summarized DDT-LCM results
plot.summary.ddt_lcm

Plot the MAP tree and class profiles of summarized DDT-LCM results
plot_tree_with_barplot

Plot the MAP tree and class profiles (bar plot) of summarized DDT-LCM results
sample_class_assignment

Sample individual class assignments Z_i, i = 1, ..., N
logllk_ddt_lcm

Calculate loglikelihood of the DDT-LCM
reattach_point

Attach a subtree to a given DDT at a randomly selected location
logllk_div_time_one

Compute loglikelihood of divergence times for a(t) = c/(1-t)
result_diet_1000iters

Result of fitting DDT-LCM to a semi-synthetic data example
sample_leaf_locations_pg

Sample the leaf locations and Polya-Gamma auxilliary variables
sample_c_one

Sample divergence function parameter c for a(t) = c / (1-t) through Gibbs sampler
sample_c_two

Sample divergence function parameter c for a(t) = c / (1-t)^2 through Gibbs sampler
initialize_poLCA

Estimate an initial response profile from latent class model using poLCA()
summary.ddt_lcm

Summarize the output of a ddt_lcm model
logit

The logistic function
logllk_ddt

Calculate loglikelihood of a DDT, including the tree structure and node parameters
predict.summary.ddt_lcm

Prediction of class memberships from posterior summaries
quiet

Suppress print from cat()
print.ddt_lcm

Print out setup of a ddt_lcm model
random_detach_subtree

Metropolis-Hasting algorithm for sampling tree topology and branch lengths from the DDT branching process.
logllk_tree_topology

Compute loglikelihood of the tree topology
print.summary.ddt_lcm

Print out summary of a ddt_lcm model
sample_sigmasq

Sample item group-specific variances through Gibbs sampler
logllk_location

Compute log likelihood of parameters
sample_tree_topology

Sample a new tree topology using Metropolis-Hastings through randomly detaching and re-attaching subtrees
proposal_log_prob

Calculate proposal likelihood
simulate_lcm_response

Simulate multivariate binary responses from a latent class model
simulate_parameter_on_tree

Simulate node parameters along a given tree.
simulate_lcm_given_tree

Simulate multivariate binary responses from a latent class model given a tree
simulate_DDT_tree

Simulate a tree from a DDT process. Only the tree topology and branch lengths are simulated, without node parameters.
J_n

Compute factor in the exponent of the divergence time distribution
a_t_one

Compute divergence function
a_t_two

Compute divergence function
H_n

Harmonic series
add_leaf_branch

Add a leaf branch to an existing tree tree_old
WAIC

Compute WAIC
add_root

Add a singular root node to an existing nonsingular tree
attach_subtree

Attach a subtree to a given DDT at a randomly selected location
add_multichotomous_tip

Add a leaf branch to an existing tree tree_old to make a multichotomus branch
add_one_sample

Functions to simulate trees and node parameters from a DDT process. Add a branch to an existing tree according to the branching process of DDT
ddtlcm-package

ddtlcm: Latent Class Analysis with Dirichlet Diffusion Tree Process Prior
draw_mnorm

Efficiently sample multivariate normal using precision matrix from \(x ~ N(Q^{-1}a, Q^{-1})\), where \(Q^{-1}\) is the precision matrix
initialize

Initialize the MH-within-Gibbs algorithm for DDT-LCM
create_leaf_cor_matrix

Create a tree-structured covariance matrix from a given tree
ddtlcm_fit

MH-within-Gibbs sampler to sample from the full posterior distribution of DDT-LCM
expit

The expit function
compute_IC

Compute information criteria for the DDT-LCM model
exp_normalize

Compute normalized probabilities: exp(x_i) / sum_j exp(x_j)
data_synthetic

Synthetic data example