Usage
profRegr(covNames, fixedEffectsNames, outcome="outcome", outcomeT=NA, data, output="output", hyper, predict,
predictType="RaoBlackwell", nSweeps=1000, nBurn=1000, nProgress=500, nFilter=1, nClusInit, seed, yModel="Bernoulli", xModel="Discrete", sampler="SliceDependent", alpha=-2, dPitmanYor = 0, excludeY=FALSE, extraYVar=FALSE, varSelectType="None", entropy,reportBurnIn=FALSE, run=TRUE, discreteCovs, continuousCovs, whichLabelSwitch="123",
includeCAR=FALSE, neighboursFile="Neighbours.txt",
weibullFixedShape=TRUE, useNormInvWishPrior=FALSE)
Arguments
covNames
A vector of strings of the covariate names as by the column names in the data argument. The names of the covariates cannot include space characters.
fixedEffectsNames
A vector of strings of the fixed effect names as by the column names in the data argument. Each fixed effect must be of class 'numeric'. If a fixed effect is of class 'character', an error message will appear and the fixed effect will need to be recoded as numeric. The names of the fixed effects cannot include space characters.
outcome
A string of column of the data argument that contains the outcome. The outcome cannot have missing values - you could consider predicting the value of the outcome for those subjects for which it has not been observed. The name cannot include space characters.
outcomeT
A string of column of the data argument that contains the offset (for Poisson outcome) or the number of trials (for Binomial outcome) or censoring for Survival reponse (coded as 0 or 1). The name cannot include space characters.
data
A data frame which has as columns the outcome, the covariates, the fixed effects if any and the offset (for Poisson outcome) or the number of trials (for Binomial outcome) or censoring (for Survival outcome). The outcome cannot have missing values - you could consider predicting the value of the outcome for those subjects for which it has not been observed. For Survival response censoring must be coded as 0 if the event has not occurred (ie, there has been censoring) and 1 if the event has occurred (no censoring has taken place). The names of the columns cannot include space characters.
output
Path to folder to save all output files. The covariates can have missing values, which must be coded as 'NA'. There cannot be missing values in the fixed effects - if there are, use an imputation method before using profile regression.
hyper
Object of type setHyperparams with hyperparameters specifications. This is optional, default values are provided for all hyperparameters. See ?setHyperparams for details.
predict
Data frame containing the predictive scenarios. This is only required if predictions are requested. At each iteration the predictive subjects are assigned to one of the current
clusters according to their covariate profiles (but ignoring missing values), or
their Rao Blackwellised estimate of theta is recorded (a weighted average of all
theta, weighted by the probability of allocation into each cluster. For Normal response they can also be randomly allocated. See also the option predictType below.
The predictive subjects have no impact on the likelihood and so do not determine
the clustering or parameters at each iteration. The predictive allocations are
then recorded as extra entries in each row of the output_z.txt file. This can
then be processed in the post processing to create a dissimilarity matrix with
the fitting subjects. The post procesing function calcPredictions will create
predicted response values for these subjects.
See ?calcPredictions for more details and examples. For Normal response,
predictType
This can be set equal to "RaoBlackwell" and "random". The default is RaoBlackwell. The random option can only be used for Normal response, where the estimated variance of the clusters is considered and the predictive subjects are randomly assigned to a mixture component and then are also randomly sampled within that component.
nSweeps
Number of iterations of the MCMC after the burn-in period. By default this is 1000.
nBurn
Number of initial iterations of the MCMC to be discarded. By default this is 1000.
reportBurnIn
If TRUE then the burn in iterations are reported in the output files, if set to FALSE they are not. It is set to FALSE by default.
nProgress
The number of sweeps at which to print a progress update. By default this is 500.
nFilter
The frequency (in sweeps) with which to write the output to file. The default value is 1.
nClusInit
The number of clusters individuals should be initially randomly assigned to (Unif[50,60]).
seed
The value for the seed for the random number generator. The default value is the current time.
yModel
The model type for the outcome variable. The options currently available are "Bernoulli", "Poisson", "Binomial", "Categorical", "Normal" and "Survival". The default value is Bernoulli.
xModel
The model type for the covariates. The options currently available are "Discrete", "Normal" and "Mixed". The default value is "Discrete".
sampler
The sampler type to be used. Options are "SliceDependent", "SliceIndependent" and "Truncated". The default value is "SliceDependent".
alpha
The value to be used if alpha is fixed. If a value smaller than or equal to -1 is used then alpha is random, if dPitmanYor is equal to zero (the random alpha option is available for Dirichlet process prior only). The default value is -2 (random alpha). For fixed alpha, if dPitmanYor is in the interval (0,1) then a Pitman-Yor process prior is used instead of a Dirichlet process prior.
dPitmanYor
The discount parameter for the Pitman-Yor process prior. The default value is 0, which is equivalent to a Dirichlet process prior. This parameter must belong to the interval [0,1) and it must be provided together with a non-negative value for alpha. The Pitman-Yor process prior is only available for non-random parameters. Note that the third label switching move is only available for Dirichlet process priors, so it will not be run if dPitmanYor>0. Therefore setting dPitmanYor to a value greater than zero will forse whichLabelSwitch=12.
excludeY
If TRUE only the covariate data X is modelled. By default this is set to FALSE.
extraYVar
If set equal to TRUE extra Gaussian variance is included in the response model. This option is available only for Bernoulli, Binomial and Poisson response. By default the extra Gaussian variance is not included, so extraYVar=FALSE.
varSelectType
The type of variable selection to be used "None", "BinaryCluster" or "Continuous". The "Continuous" variable selection is the implementation of the novel variable selection formulation proposed by Papathomas, Molitor, Hoggart, Hastie, Richardson (2012) "Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene x gene patterns" in Genetic Epidemiology. The "BinaryCluster" variable selection is based on the method proposed by Chung and Dunson (2009) "Nonparametric Bayes conditional distribution modelling with variable selection" in the Journal of the American Statistical Association. Both types of variable selection can be used with discrete, continuous or mixed covariates. The default value is "None".
entropy
If included then we compute allocation entropy. By default the allocation entropy is not included.
run
Logical. If TRUE then the MCMC is run. Set run=FALSE if the MCMC has been run already and it is only required to collect information about the run.
discreteCovs
The names of the discrete covariates among the covariate names, if xModel="Mixed". This and continuousCovs must be defined if xModel="Mixed", while covNames is ignored.
continuousCovs
The names of the discrete covariates among the covariate names, if xModel="Mixed". This and continuousCovs must be defined if xModel="Mixed", while covNames is ignored.
whichLabelSwitch
The label switching moves to run. The options available are moves 1, 2 and 3 ("123"), moves 1 and 2 ("12") and move 3 only ("3"). The moves are described in Hastie et al. (2013). Note that the third label switching move is only available for Dirichlet process priors, so it will not be run if dPitmanYor>0. Therefore setting dPitmanYor to a value greater than zero will forse whichLabelSwitch=12.
includeCAR
A boolean specifying wether a conditional autoregressive term should be introduced within the model, to take into account possible spatial correlation within residuals. Only for Poisson and Normal response models.
neighboursFile
The file name of the file specifying neighbourhood graph. It should have the same structure than neighbourhood graph files used in the "INLA" package, and can be produced from a nb object of package "spdep", by the function "nb2INLA" of package "spdep". See ?nb2INLA for details.
weibullFixedShape
This parameter controls whether the shape parameter of the Weibull distribution (for yModel=Survival only) is a global parameter (fixed) or cluster specific. It is equal to TRUE by default.
useNormInvWishPrior
By default this variable equals FALSE. When this variable equals TRUE, the conjugate Normal-inverse-Wishart prior is used rather
than the independant normal and inverse Whishart priors. If this prior is used, variable selection cannot be used as it has not been implemented.
Authors
David Hastie, Department of Epidemiology and Biostatistics, Imperial College London, UK Silvia Liverani, Department of Epidemiology and Biostatistics, Imperial College London and MRC Biostatistics Unit, Cambridge, UK Aurore J. Lavigne, Department of Epidemiology and Biostatistics, Imperial College London, UK Lamiae Azizi, MRC Biostatistics Unit, Cambridge, UK Maintainer: Silvia Liverani The R package PReMiuM is supported through research grants. One key requirement of such funding applications is the ability to demonstrate the impact of the work we seek funding for can. Whatever you are using PReMiuM for, it would be very helpful for us to learn about our users, to tailor our future methodological developments to your needs. Please email us at liveranis@gmail.com or visit http://www.silvialiverani.com/support-premium/.