Usage
profRegr(covNames, fixedEffectsNames, outcome="outcome",
outcomeT=NA, data, output="output", hyper, predict,
nSweeps=1000, nBurn=1000, nProgress=500, nFilter=1,
nClusInit, seed, yModel="Bernoulli", xModel="Discrete",
sampler="SliceDependent", alpha=-2, dPitmanYor = 0, excludeY=FALSE,
extraYVar=FALSE, varSelectType="None", entropy,reportBurnIn=FALSE,
run=TRUE, discreteCovs, continuousCovs, whichLabelSwitch="123")
Arguments
covNames
A vector of strings of the covariate names as by the column names in the data argument.
fixedEffectsNames
A vector of strings of the fixed effect names as by the column names in the data argument. Each fixed effect must be of class 'numeric'. If a fixed effect is of class 'character', an error message will appear and the fixed effect will need to be recoded a
outcome
A string of column of the data argument that contains the outcome. The outcome cannot have missing values - you could consider predicting the value of the outcome for those subjects for which it has not been observed.
outcomeT
A string of column of the data argument that contains the offset (for Poisson outcome) or the number of trials (for Binomial outcome).
data
A data frame which has as columns the outcome, the covariates, the fixed effects if any and the offset (for Poisson outcome) or the number of trials (for Binomial outcome) or censoring (for Survival outcome). The outcome cannot have missing values - you c
output
Path to folder to save all output files. The covariates can have missing values, which must be coded as 'NA'. There cannot be missing values in the fixed effects - if there are, use an imputation method before using profile regression.
hyper
Object of type setHyperparams with hyperparameters specifications. This is optional, default values are provided for all hyperparameters. See ?setHyperparams for details.
predict
Data frame containing the predictive scenarios. This is only required if predictions are requested. At each iteration the predictive subjects are assigned to one of the current
clusters according to their covariate profiles (but ignoring missing values)
nSweeps
Number of iterations of the MCMC after the burn-in period. By default this is 1000.
nBurn
Number of initial iterations of the MCMC to be discarded. By default this is 1000.
reportBurnIn
If TRUE then the burn in iterations are reported in the output files, if set to FALSE they are not. It is set to FALSE by default.
nProgress
The number of sweeps at which to print a progress update. By default this is 500.
nFilter
The frequency (in sweeps) with which to write the output to file. The default value is 1.
nClusInit
The number of clusters individuals should be initially randomly assigned to (Unif[50,60]).
seed
The value for the seed for the random number generator. The default value is the current time.
yModel
The model type for the outcome variable. The options currently available are "Bernoulli", "Poisson", "Binomial", "Categorical", "Normal" and "Survival". The default value is Bernoulli.
xModel
The model type for the covariates. The options currently available are "Discrete", "Normal" and "Mixed". The default value is "Discrete".
sampler
The sampler type to be used. Options are "SliceDependent", "SliceIndependent" and "Truncated". The default value is "SliceDependent".
alpha
The value to be used if alpha is fixed. If a value smaller than or equal to -1 is used then alpha is random, if dPitmanYor is equal to zero (the random alpha option is available for Dirichlet process prior only). The default value is -2 (random alpha). Fo
dPitmanYor
The discount parameter for the Pitman-Yor process prior. The default value is 0, which is equivalent to a Dirichlet process prior. This parameter must belong to the interval [0,1) and it must be provided together with a non-negative value for alpha. The P
excludeY
If TRUE only the covariate data X is modelled. By default this is set to FALSE.
extraYVar
If set equal to TRUE extra Gaussian variance is included in the response model. This option is available only for Bernoulli, Binomial and Poisson response. By default the extra Gaussian variance is not included, so extraYVar=FALSE.
varSelectType
The type of variable selection to be used "None", "BinaryCluster" or "Continuous". The "BinaryCluster" variable selection is the implementation of the novel variable selection formulation proposed by Papathomas, Molitor, Hoggart, Hastie, Richardson (2012
entropy
If included then we compute allocation entropy. By default the allocation entropy is not included.
run
Logical. If TRUE then the MCMC is run. Set run=FALSE if the MCMC has been run already and it is only required to collect information about the run.
discreteCovs
The names of the discrete covariates among the covariate names, if xModel="Mixed". This and continuousCovs must be defined if xModel="Mixed", while covNames is ignored.
continuousCovs
The names of the discrete covariates among the covariate names, if xModel="Mixed". This and continuousCovs must be defined if xModel="Mixed", while covNames is ignored.
whichLabelSwitch
The label switching moves to run. The options available are moves 1, 2 and 3 ("123"), moves 1 and 2 ("12") and move 3 only ("3"). The moves are described in Hastie et al. (2013). Note that the third label switching move is only available for Dirichlet pro