Learn R Programming

PReMiuM (version 3.0.24)

PReMiuM-package: Dirichlet Process Bayesian Clustering

Description

Dirichlet process Bayesian clustering and functions for the post-processing of its output.

Arguments

Details

PReMiuM provides the following:
  • Implements an infinite Dirichlet process model
  • Can do dependent or independent slice sampling (Kalli et al., 2011) or truncated Dirichlet process model (Ishwaran and James, 2001)
  • Handles categorical or Normal covariates, or a mixture of them
  • Handles Bernoulli, Binomial, Categorical, Poisson or Normal responses
  • Handles inclusion of fixed effects in the response model
  • Handles Extra Variation in the response (for Bernoulli, Binomial and Poisson response only)
  • Handles variable selection (tested in Discrete covariate case only)
  • Includes label switching moves for better mixing
  • Allows user to exclude the response from the model
  • Allows user to compute the entropy of the allocation
  • Allows user to run with a fixed alpha or update alpha (default)
  • Allows users to run predictive scenarios (at C++ run time)
  • Basic or Rao-Blackwellised predictions can be produced
  • Handling of missing data
  • C++ for model fitting
  • Uses Eigen Linear Algebra Library and Boost C++
  • Completely self contained (all library code in included in distribution)
  • Adaptive MCMC where appropriate
  • R package for generating simulation data and post processing
  • R plotting functions allow user choice of what to order clusters by

Authors

David Hastie, Department of Epidemiology and Biostatistics, Imperial College London, UK

Silvia Liverani, Department of Epidemiology and Biostatistics, Imperial College London and MRC Biostatistics Unit, Cambridge, UK

Maintainer: Silvia Liverani

Acknowledgements

Silvia Liverani thanks The Leverhulme Trust for financial support.

Details

ll{ Package: PReMiuM Type: Package Version: 3.0.24 Date: 2014-04-04 License: GPL3 LazyLoad: yes }

Program to implement Dirichlet Process Bayesian Clustering as described in Liverani et al. 2013. Previously this project was called profile regression.

References

Molitor J, Papathomas M, Jerrett M and Richardson S. (2010) Bayesian Profile Regression with an Application to the National Survey of Children's Health, Biostatistics 11: 484-498.

Papathomas M, Molitor J, Richardson S. et al (2011) Examining the joint effect of multiple risk factors using exposure risk profiles: lung cancer in non smokers. Environmental Health Perspectives 119: 84-91.

Hastie, D. I., Liverani, S., Azizi, L., Richardson, S. and Stucker I. (2013) A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer. BMC Medical Research Methodology. To appear.

Molitor, J., Brown, I. J., Papathomas, M., Molitor, N., Liverani, S., Chan, Q., Richardson, S., Van Horn, L., Daviglus, M. L., Stamler, J. and Elliott, P. (2013) Blood pressure differences associated with DASH-like lower sodium compared with typical American higher sodium nutrient profile: INTERMAP USA. Submitted.

Hastie, D. I., Liverani, S. and Richardson, S. (2014) Sampling from Dirichlet process mixture models with unknown concentration parameter: Mixing issues in large data implementations. Submitted. Available at http://uk.arxiv.org/abs/1304.1778

Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M. and Richardson, S. (2014) PReMiuM: An R package for Profile Regression Mixture Models using Dirichlet Processes. Submitted. Available at http://uk.arxiv.org/abs/1303.2836

Examples

Run this code
# example for Poisson outcome and Discrete covariates
inputs <- generateSampleDataFile(clusSummaryPoissonDiscrete())
runInfoObj<-profRegr(yModel=inputs$yModel, 
    xModel=inputs$xModel, nSweeps=10, nClusInit=20,
    nBurn=20, data=inputs$inputData, output="output", 
    covNames = inputs$covNames, outcomeT = inputs$outcomeT,
    fixedEffectsNames = inputs$fixedEffectNames)

dissimObj<-calcDissimilarityMatrix(runInfoObj)
clusObj<-calcOptimalClustering(dissimObj)
riskProfileObj<-calcAvgRiskAndProfile(clusObj)
clusterOrderObj<-plotRiskProfile(riskProfileObj,"summary.png")

Run the code above in your browser using DataLab