Learn R Programming

baker: Bayesian Analysis Kit for Etiology Research

An R Package for Fitting Bayesian Nested Partially Latent Class Models

Maintainer: Zhenke Wu, zhenkewu@umich.edu

Source Code: Please click here for source code on GitHub.

Issues: Please click here to report reproducible issues.

Vignette: Please click here to read the latest long-version vignette; a short version can be found here.

Package website: Please click here for a website generated by pkgdown, which contains html format of the package manual (“Reference”).

References: If you are using baker for population and individual estimation from case-control data, please cite the following papers:

Citation
partially Latent Class Models (pLCM)Wu, Z., Deloria-Knoll, M., Hammitt, L. L., Zeger, S. L. and the Pneumonia Etiology Research for Child Health Core Team (2016), Partially latent class models for case–control studies of childhood pneumonia aetiology. J. R. Stat. Soc. C, 65: 97–114.
nested pLCMWu, Z., Deloria-Knoll, M., Zeger, S.L.; Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics 2017; 18 (2): 200-213.
nested pLCM regressionWu, Z., Chen, I (2021). Probabilistic Cause-of-disease Assignment using Case-control Diagnostic Tests: A Hierarchical Bayesian Approach. Statistics in Medicine 40(4):823-841.
ApplicationMaria Deloria Knoll, Wei Fu, Qiyuan Shi, Christine Prosperi, Zhenke Wu, Laura L. Hammitt, Daniel R. Feikin, Henry C. Baggett, Stephen R.C. Howie, J. Anthony G. Scott, David R. Murdoch, Shabir A. Madhi, Donald M. Thea, W. Abdullah Brooks, Karen L. Kotloff, Mengying Li, Daniel E. Park, Wenyi Lin, Orin S. Levine, Katherine L. O’Brien, Scott L. Zeger; Bayesian Estimation of Pneumonia Etiology: Epidemiologic Considerations and Applications to the Pneumonia Etiology Research for Child Health Study, Clinical Infectious Diseases, Volume 64, Issue suppl_3, 15 June 2017, Pages S213–S227
Primary PERCH AnalysisThe PERCH Study Group (2019). Aetiology of severe hospitalized pneumonia in HIV-uninfected children from Africa and Asia: the Pneumonia Aetiology Research for Child Health (PERCH) Case-Control Study. The Lancet 394(10200): 757-779.
Software paperChen I, Shi Q, Zeger SL, Wu Z (2022+) baker: An R package for Nested Partially-Latent Class Models.

There are a number of scientific papers on global health and infectious diseases that have used the model and some the software (in its earlier versions). Some notable examples are listed below:

Notable References using baker (model and/or software)
1Kubale et al. (2023). Etiology of acute lower respiratory illness hospitalizations among infants in four countries. Open Forum Infectious Diseases, ofad580.
2Saha SK et al. (2018). Causes and incidence of community-acquired serious infections among young children in south Asia (ANISA): an observational cohort study. The Lancet 392(10142):145-159.

Table of content

Installation

# install.packages("devtools",repos="https://cloud.r-project.org")
devtools::install_github("zhenkewu/baker")

Note:

  • run install.packages("pbkrtest") for R(>=3.2.3) if this package is reported as missing.
  • Windows User: use devtools::install_github("zhenkewu/baker",INSTALL_opts=c("--no-multiarch")) instead if you see an error message ERROR: loading failed for 'i386' (Thanks Chrissy!).

Vignettes

devtools::install_github("zhenkewu/baker", build_vignettes=TRUE) # will take extra time to run a few examples.
browseVignettes("baker")

Graphical User Interface (GUI)

# install.packages("devtools",repos="http://watson.nci.nih.gov/cran_mirror/")
devtools::install_github("zhenkewu/baker")
shiny::runApp(system.file("shiny", package = "baker"))

For developers interested in low-level details, here is a pretty awesome visualization of the function dependencies within the package:

library(DependenciesGraphs) # if not installed, try this-- devtools::install_github("datastorm-open/DependenciesGraphs")
library(QualtricsTools) # devtools::install_github("emmamorgan-tufts/QualtricsTools")
dep <- funDependencies('package:baker','nplcm')
plot(dep)

You will get a dynamic figure. A snapshot is below:

Analytic Goal

  • To study disease etiology from case-control data from multiple sources that have measurement errors. If you are interested in estimating the population etiology pie (fraction), and the probability of each cause for individual case, try baker.

Comparison to Other Existing Solutions

  • Acknowledges various levels of measurement errors and combines multiple sources of data for optimal disease diagnosis.
  • Main function: nplcm() that fits the model with or without covariates.

Details

  1. Implements hierarchical Bayesian models to infer disease etiology for multivariate binary data. The package builds in functionalities for data cleaning, exploratory data analyses, model specification, model estimation, visualization and model diagnostics and comparisons, catalyzing vital effective communications between analysts and practicing clinicians.
  2. baker has implemented models for dependent measurements given disease status, regression analyses of etiology, multiple imperfect measurements, different priors for true positive rates among cases with differential measurement characteristics, and multiple-pathogen etiology.
  3. Scientists in Pneumonia Etiology Research for Child Health (PERCH) study usually refer to the etiology distribution as “population etiology pie” and “individual etiology pie” for their compositional nature, hence the name of the package.

Platform

  • The baker package is compatible with OSX, Linux and Windows systems, each requiring a slightly different setup as described below. If you need to speed up the installation and analysis, please contact the maintainer or chat by clicking the gitter button at the top of this README file.

Connect R to JAGS

Mac OSX (10.11+)

  1. Use Just Another Gibbs Sampler (JAGS)
  2. Install JAGS 4.3.2 (or 4.2.0 - currently it is slightly slower for 4.3.2, which was updated to be compatible with R 4.3.x); Download here
  3. Install R; Download from here
  4. Fire up R, run R command install.packages("rjags")
  5. Run R command library(rjags) in R console; If the installations are successful, you’ll see some notes like this:
>library(rjags)
Loading required package: coda
Linked to JAGS 4.x.0
Loaded modules: basemod,bugs
  • Run R command library(baker). If the package ks cannot be loaded due to failure of loading package rgl, first install X11 by going here, followed by
install.packages("http://download.r-forge.r-project.org/src/contrib/rgl_0.95.1504.tar.gz",repo=NULL,type="source")

Unix (Build from source without administrative privilege)

Here we use JHPCE as an example. The complete installation guide offers extra information.

  1. Download source code for JAGS 4.2.0; The workflow would be similar for later versions of JAGS.

  2. Suppose you’ve downloaded it in ~/local/jags/4.2.0. Follow the bash commands below:

    # change to the directory with the newly downloaded source files:
    cd ~/local/jags/4.2.0
    
    # create a new folder named "usr"
    mkdir usr
    
    # decompress files:
    tar zxvf JAGS-4.2.0.tar.gz
    
    # change to the directory with newly decompressed files:
    cd ~/local/jags/4.2.0/JAGS-4.2.0
    
    
    
    # specify new JAGS home:
    export JAGS_HOME=$HOME/local/jags/4.2.0/usr
    export PATH=$JAGS_HOME/bin:$PATH
    
    # link to BLAS and LAPACK:
    # Here I have used "/usr/lib64/atlas/" and "/usr/lib64/" on JHPCE that give me
    # access to libblas.so.3 and liblapack.so.3. Please modify to paths on your system.
    LDFLAGS="-L/usr/lib64/atlas/ -L/usr/lib64/" ./configure --prefix=$JAGS_HOME --libdir=$JAGS_HOME/lib64 
    
    # if you have 8 cores:
    make -j8
    make install
    
    # prepare to install R package, rjags:
    export PKG_CONFIG_PATH=$HOME/local/jags/4.2.0/usr/lib64/pkgconfig 
    
    module load R
    R> install.packages("rjags")
    # or if the above fails, try:
    R>install.packages("rjags", configure.args="--enable-rpath")
  3. Also check out the INSTALLATION file for rjags package.

Submitting Jobs to Computing Cluster via a shell script

Again, I use JHPCE as an example.

#!/bin/bash
#$ -M zhenkewu@gmail.com
#$ -N baker_regression_perch
#$ -o /users/zhwu/baker_regression/data_analysis/baker_regression_test.txt
#$ -e /users/zhwu/baker_regression/data_analysis/baker_regression_test.txt

export JAGS_HOME=$HOME/local/jags/4.2.0/usr
export PATH=$JAGS_HOME/bin:$PATH

export LD_LIBRARY_PATH=$JAGS_HOME/lib64

cd /users/zhwu/baker_regression/data_analysis
#$ -cwd

echo "**** Job starts ****"
date
echo "**** JHPCE info ****"
echo "User: ${USER}"
echo "Job id: ${JOB_ID}"
echo "Job name: ${JOB_NAME}"
echo "Hostname: ${HOSTNAME}" 

Rscript real_regression_data_jhpce.R

echo "**** Job ends ****" 
date

Windows

  • JAGS 4.2.0 (also applicable to later versions)
  1. Install R; Download from here
  2. Install JAGS 4.2.0; Add the path to JAGS 4.2.0 into the environmental variable (essential for R to find the jags program). See this for setting environmental variables;
  • alternatives are brew install -v jags for OSX, sudo apt-get install jags for Ubuntu/Debian
  1. Fire up R, run R command install.packages("rjags")
  2. Install Rtools (for building and installing R packages from source); Add the path to Rtools (e.g., C:\Rtools\) into your environmental variables so that R knows where to find it.

Example data sets

We provide two simulated data sets in the package:

data(data_nplcm_noreg)

data(data_nplcm_reg_nest)

Copy Link

Version

Install

install.packages('baker')

Monthly Downloads

260

Version

1.0.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Zhenke Wu

Last Published

December 11th, 2025

Functions in baker (1.0.4)

add_meas_BrS_param_Nest_Slice

add parameters for a BrS measurement slice among cases and controls (conditional dependence)
add_meas_BrS_ctrl_Nest_Slice

add likelihood for a BrS measurement slice among controls (conditional independence)
add_meas_BrS_param_Nest_Slice_jags

add parameters for a BrS measurement slice among cases and controls (conditional dependence)
add_meas_BrS_param_NoNest_Slice

add parameters for a BrS measurement slice among cases and controls (conditional independence)
add_meas_BrS_param_NoNest_reg_Slice_jags

add parameters for a BrS measurement slice among cases and controls
add_meas_BrS_ctrl_NoNest_reg_Slice_jags

add a likelihood component for a BrS measurement slice among controls
add_meas_BrS_ctrl_NoNest_Slice

add a likelihood component for a BrS measurement slice among controls (conditional independence)
add_meas_BrS_param_NoNest_Slice_jags

add parameters for a BrS measurement slice among cases and controls (conditional independence)
add_meas_BrS_ctrl_NoNest_reg_discrete_predictor_Slice_jags

add a likelihood component for a BrS measurement slice among controls
add_meas_BrS_param_Nest_reg_Slice_jags

add parameters for a BrS measurement slice among cases and controls
beta_parms_from_quantiles

Pick parameters in the Beta distribution to match the specified range
add_meas_BrS_subclass_Nest_Slice

add subclass indicators for a BrS measurement slice among cases and controls (conditional independence)
beta_plot

Plot beta density
bin2dec

Convert a 0/1 binary-coded sequence into decimal digits
add_meas_BrS_param_NoNest_reg_discrete_predictor_Slice_jags

add parameters for a BrS measurement slice among cases and controls
add_meas_SS_param

add parameters for a SS measurement slice among cases (conditional independence)
baker-package

baker: Bayesian Analytic Kit for Etiology Research
add_meas_SS_case

add likelihood for a SS measurement slice among cases (conditional independence)
as.matrix_or_vec

convert one column data frame to a vector
assign_model

Interpret the specified model structure
data_nplcm_noreg

Simulated dataset that is structured in the format necessary for an nplcm() without regression
check_dir_create

check existence and create folder if non-existent
clean_perch_data

Clean PERCH data
combine_data_nplcm

combine multiple data_nplcm (useful when simulating data from regression models)
compute_logOR_single_cause

Calculate marginal log odds ratios
clean_combine_subsites

Combine subsites in raw PERCH data set
create_bugs_regressor_FPR

create regressor summation equation used in regression for FPR
compute_marg_PR_nested_reg

compute positive rates for nested model with subclass mixing weights that are the same across Jcause classes for each person (people may have different weights.)
get_direct_bias

Obtain direct bias that measure the discrepancy of a posterior distribution of pie and a true pie.
get_coverage

Obtain coverage status from a result folder
dm_Rdate_FPR

Make FPR design matrix for dates with R format.
dm_Rdate_Eti

Make etiology design matrix for dates with R format.
get_plot_pos

get a list of measurement index where to look for data
get_postsd

Obtain posterior standard deviation from a result folder
get_fitted_mean_no_nested

get model fitted mean for conditional independence model
get_fitted_mean_nested

get fitted mean for nested model with subclass mixing weights that are the same among cases
get_marginal_rates_no_nested

get marginal TPR and FPR for no nested model
get_metric

Obtain Integrated Squared Aitchison Distance, Squared Bias and Variance (both on Central Log-Ratio transformed scale) that measure the discrepancy of a posterior distribution of pie and a true pie.
expit

expit function
create_bugs_regressor_Eti

create regressor summation equation used in regression for etiology
compute_marg_PR_nested_reg_array

compute positive rates for nested model with subclass mixing weights that are the same across Jcause classes for each person (people may have different weights.)
data_nplcm_reg_nest

Simulated dataset that is structured in the format necessary for an nplcm() with regression
get_pEti_samp

get etiology samples by names (no regression)
delete_start_with

Deletes a pattern from the start of a string, or each of a vector of strings.
insert_bugfile_chunk_reg_nonest_meas

Insert measurement likelihood (with regression) code chunks into .bug model file
insert_bugfile_chunk_reg_nest_meas

Insert measurement likelihood (nested model+regression) code chunks into .bug model file
init_latent_jags_multipleSS

Initialize individual latent status (for JAGS)
insert_bugfile_chunk_noreg_etiology

insert distribution for latent status code chunk into .bug file
get_top_pattern

get top patterns from a slice of bronze-standard measurement
has_non_basis

test if a formula has terms not created by [s_date_Eti() or s_date_FPR()
insert_bugfile_chunk_noreg_meas

Insert measurement likelihood (without regression) code chunks into .bug model file
get_plot_num

get the plotting positions (numeric) for the fitted means; 3 positions for each cell
insert_bugfile_chunk_reg_discrete_predictor_etiology

insert etiology regression for latent status code chunk into .bug file; discrete predictors
get_latent_seq

get index of latent status
get_marginal_rates_nested

get marginal TPR and FPR for nested model
get_individual_data

get individual data
extract_data_raw

Import Raw PERCH Data extract_data_raw imports and converts the raw data to analyzable format
logOR

calculate pairwise log odds ratios
is_discrete

Check if covariates are discrete
is.error

Test for 'try-error' class
logit

logit function
insert_bugfile_chunk_reg_discrete_predictor_nonest_meas

Insert measurement likelihood (with regression; discrete) code chunks into .bug model file
insert_bugfile_chunk_reg_etiology

insert etiology regression for latent status code chunk into .bug file
jags2_baker

Run JAGS from R
is_length_all_one

check if a list has elements all of length one
make_filename

Create new file name
marg_H

Shannon entropy for binary data
match_cause

Match latent causes that might have the same combo but different specifications
lookup_quality

Get position to store in data_nplcm$Mobs:
logsumexp

log sum exp trick
make_numbered_list

Make a list with numbered names
make_template

make a mapping template for model fitting
nplcm

Fit nested partially-latent class models (highest-level wrapper function)
make_foldername

Create new folder name
nplcm_fit_NoReg

Fit nested partially-latent class model (low-level)
merge_lists

For a list of many sublists each of which has matrices as its member, we combine across the many sublists to produce a final list
order_post_eti

order latent status by posterior mean
null_as_zero

Convert NULL to zero.
my_reorder

Reorder the measurement dimensions to match the order for display
nplcm_fit_Reg_Nest

Fit nested partially-latent class model with regression (low-level)
line2user

convert line to user coordinates
make_meas_object

Make measurement slice
make_list

Takes any number of R objects as arguments and returns a list whose names are derived from the names of the R objects.
loadOneName

load an object from .RDATA file
overall_uniform

specify overall uniform (symmetric Dirichlet distribution) for etiology prior
is_jags_folder

See if a result folder is obtained by JAGS
get_individual_prediction

get individual prediction (Bayesian posterior)
pathogen_category_perch

pathogens and their categories in PERCH study (virus or bacteria)
pathogen_category_simulation

Hypothetical pathogens and their categories (virus or bacteria)
nplcm_read_folder

Read data and other model information from a folder that stores model results.
nplcm_fit_Reg_discrete_predictor_NoNest

Fit nested partially-latent class model with regression (low-level)
plot_case_study

visualize the PERCH etiology regression with a continuous covariate
plot_SS_panel

Plot silver-standard (SS) panel
is_intercept_only

check if the formula is intercept only
plot_subwt_regression

visualize the subclass weight regression with a continuous covariate
parse_nplcm_reg

parse regression components (either false positive rate or etiology regression) for fitting npLCM; Only use this when formula is not NULL.
set_prior_tpr_SS

Set true positive rate (TPR) prior ranges for silver-standard data.
plot.nplcm

plot.nplcm plot the results from nplcm().
plot_check_common_pattern

Posterior predictive checking for the nested partially class models - frequent patterns in the BrS data. (for multiple folders)
print.summary.nplcm.reg_nonest_strat

Compact printing of nplcm() model fits
simulate_latent

Simulate Latent Status:
plot_etiology_regression

visualize the etiology regression with a continuous covariate
plot_etiology_strat

visualize the etiology estimates for each discrete levels
nplcm_fit_Reg_NoNest

Fit nested partially-latent class model with regression (low-level)
read_meas_object

Read measurement slices
simulate_brs

Simulate Bronze-Standard (BrS) Data
plot_check_pairwise_SLORD

Posterior predictive checking for nested partially latent class models - pairwise log odds ratio (only for bronze-standard data)
print.summary.nplcm.no_reg

Compact printing of nplcm() model fits
summarize_SS

silver-standard data summary
summarize_BrS

summarize bronze-standard data
print.nplcm

print.nplcm summarizes the results from nplcm().
plot_BrS_panel

Plot bronze-standard (BrS) panel
set_strat

Stratification setup by covariates
show_dep

Show function dependencies
plot_panels

Plot three-panel figures for nested partially-latent model results
print.summary.nplcm.reg_nest

Compact printing of nplcm() model fits
summary.nplcm

summary.nplcm summarizes the results from nplcm().
show_individual

get an individual's data from the output of clean_perch_data()
unfactor

Convert factor to numeric without losing information on the label
plot_pie_panel

Plot etiology (pie) panel
unique_cause

get unique causes, regardless of the actual order in combo
sym_diff_month

get symmetric difference of months from two vector of R-format dates
s_date_FPR

Make false positive rate (FPR) design matrix for dates with R format.
simulate_ss

Simulate Silver-Standard (SS) Data
tsb

generate stick-breaking prior (truncated) from a vector of random probabilities
simulate_nplcm

Simulate data from nested partially-latent class model (npLCM) family
symb2I

Convert names of pathogen/combinations into 0/1 coding
set_prior_tpr_BrS_NoNest

Set true positive rate (TPR) prior ranges for bronze-standard (BrS) data
s_date_Eti

Make Etiology design matrix for dates with R format.
rvbern

Sample a vector of Bernoulli variables.
print.summary.nplcm.reg_nest_strat

Compact printing of nplcm() model fits
plot_leftmost

plotting the labels on the left margin for panels plot
print.summary.nplcm.reg_nonest

Compact printing of nplcm() model fits
plot_logORmat

Visualize pairwise log odds ratios (LOR) for data that are available in both cases and controls
visualize_season

visualize trend of pathogen observation rate for NPPCR data (both cases and controls)
visualize_case_control_matrix

Visualize matrix for a quantity measured on cases and controls (a single number)
softmax

softmax
subset_data_nplcm_by_index

subset data from the output of clean_perch_data()
unique_month

Get unique month from Date
write.model

function to write bugs model (copied from R2WinBUGS)
write_model_Reg_NoNest

Write .bug model file for regression model without nested subclasses
write_model_Reg_discrete_predictor_NoNest

Write .bug model file for regression model without nested subclasses
write_model_Reg_Nest

Write .bug model file for regression model WITH nested subclasses
write_model_NoReg

Write .bug model file for model without regression
add_meas_BrS_case_Nest_Slice_jags

add likelihood for a BrS measurement slice among cases (conditional dependence)
NA2dot

convert 'NA' to '.'
add_meas_BrS_case_NoNest_Slice

add a likelihood component for a BrS measurement slice among cases (conditional independence)
Imat2cat

Convert a matrix of binary indicators to categorical variables
H

Shannon entropy for multivariate discrete data
add_meas_BrS_case_NoNest_Slice_jags

add a likelihood component for a BrS measurement slice among cases (conditional independence)
add_meas_BrS_case_NoNest_reg_discrete_predictor_Slice_jags

add likelihood component for a BrS measurement slice among cases
I2symb

Convert 0/1 coding to pathogen/combinations
add_meas_BrS_case_Nest_Slice

add likelihood for a BrS measurement slice among cases (conditional dependence)
add_meas_BrS_case_NoNest_reg_Slice_jags

add likelihood component for a BrS measurement slice among cases