write.MCMC: Writes MLwiN macros to fit models using Markov chain Monte Carlo (MCMC) methods

Description

write.MCMC is an internal function which creates an MLwiN macro file to fit models using MCMC.

Usage

write.MCMC(
  indata,
  dtafile,
  oldsyntax = FALSE,
  resp,
  levID,
  expl,
  rp,
  D = "Normal",
  nonlinear = c(0, 1),
  categ = NULL,
  notation = NULL,
  nonfp = NULL,
  clre = NULL,
  Meth = 1,
  merr = NULL,
  carcentre = FALSE,
  maxiter = 20,
  convtol = 2,
  seed = 1,
  iterations = 5000,
  burnin = 500,
  scale = 5.8,
  thinning = 1,
  priorParam = "default",
  refresh = 50,
  fixM = 1,
  residM = 1,
  Lev1VarM = 1,
  OtherVarM = 1,
  adaption = 1,
  priorcode = c(gamma = 1),
  rate = 50,
  tol = 10,
  lclo = 0,
  mcmcOptions,
  fact = NULL,
  xc = NULL,
  mm = NULL,
  car = NULL,
  BUGO = NULL,
  mem.init = "default",
  optimat = FALSE,
  modelfile,
  initfile,
  datafile,
  macrofile,
  IGLSfile,
  MCMCfile,
  chainfile,
  MIfile,
  resifile,
  resi.store = FALSE,
  resioptions,
  resichains,
  FACTchainfile,
  resi.store.levs = NULL,
  debugmode = FALSE,
  startval = NULL,
  dami = NULL,
  namemap = sapply(colnames(indata), as.character),
  saveworksheet = NULL,
  rng.version = 10
)

Value

Outputs a modified version of namemap containing newly generated short names.

Arguments

indata: A data.frame object containing the data to be modelled.
dtafile: The file name of the dataset to be imported into MLwiN, which is in Stata format (i.e. with extension .dta).
oldsyntax: Specified as FALSE if new syntax has been used in Formula object, else specified as TRUE (to enable backcompatibility).
resp: A character string (vector) of response variable(s).
levID: A character string (vector) of the specified level ID(s). The ID(s) should be sorted in the descending order of levels (e.g. levID = c('level2', 'level1') where 'level2' is the higher level).
expl: A character string (vector) of explanatory (predictor) variable(s).
rp: A character string (vector) of random part of random variable(s).
D: A character string/vector specifying the type of distribution to be modelled, which can include 'Normal' (the default), 'Binomial', 'Poisson', 'Unordered Multinomial', 'Ordered Multinomial', 'Multivariate Normal', or 'Mixed'. In the case of the latter, 'Mixed' precedes the response types which also need to be listed in D, e.g. c('Mixed', 'Normal', 'Binomial'); these need to be be listed in the same order to which they are referred to in the Formula object (see runMLwiN, Formula.translate, Formula.translate.compat. 'Mixed' combinations can only consist of 'Normal' and 'Binomial' for MCMC estimation.
nonlinear: A character vector specifying linearisation method for IGLS starting values for discrete response models (see Chapter 9 of Rasbash et al 2012, and Goldstein 2011). N = 0 specifies marginal quasi-likelihood linearization (MQL), whilst N = 1 specifies penalised quasi- likelihood linearization (PQL); M = 1 specifies first order approximation, whilst M = 2 specifies second order approximation. nonlinear = c(N = 0, M = 1) by default. First order marginal quasi-likelihood (MQL1) only option for single-level discrete response models.
categ: Specifies categorical variable(s) as a matrix. Each column corresponds to a categorical variable; the first row specifies the name(s) of variable(s); the second row specifies the name(s) of reference group(s), NA(s) if no reference group; the third row states the number of categories for each variable.
notation: Specifies the model subscript notation to be used in the MLwiN equations window. 'class' means no multiple subscripts, whereas 'level' has multiple subscripts.
nonfp: Removes the fixed part of random variable(s). NA if no variable is removed.
clre: A matrix used to estimate some, but not all, of the variances and covariances for a set of coefficients at a particular level. Remove from the random part at level <first row> the covariance matrix element(s) defined by the pair(s) of rows <second row> <third row>. Each row corresponds to a removed entry of the covariance matrix.
Meth: Specifies the maximum likelihood estimation method to be used when generating starting values via (R)IGLS. If Meth = 0 estimation method is set to RIGLS. If Meth = 1 estimation method is set to IGLS (the default setting). If Meth is absent, alternate between IGLS and RIGLS.
merr: A vector which sets-up measurement errors on predictor variables. The first element N defines the number of variables that have measurement errors. Then, for each variable with measurement error, a pair of inputs is required: the first of which is the explanatory variable name as a character string, and the second of which is the variance of the measurement error for this variable. See demo(MCMCGuide14) for an example.
carcentre: If CAR model (i.e. if car is non-NULL), carcentre = TRUE mean-centres all random effects at that level.
maxiter: When generating starting values via (R)IGLS, a numeric value specifying the total number of iterations, from the start, before IGLS estimation halts (if startval = NULL).
convtol: When generating starting values via (R)IGLS, a numeric value specifying the IGLS convergence criterion, as specified in the tol option within estoptions, where startval = NULL) (see runMLwiN). If value of convtol is m, estimation will be deemed to have converged when the relative change in the estimate for any parameter from one iteration to the next is less than 10(-m). Defaults to value of 2 for m if not otherwise specified.
seed: An integer specifying the random seed in MLwiN.
iterations: An integer specifying the number of iterations after burn-in.
burnin: An integer specifying length of the burn-in.
scale: An integer specifying the scale factor of proposed variances; this number will be multiplied by the estimated parameter variance (from IGLS/RIGLS) to give the proposal distribution variance.
thinning: An integer specifying the frequency with which successive values in the Markov chain are stored. By default thinning = 1.
priorParam: A vector specifying the informative priors used, as output from prior2macro.
refresh: An integer specifying how frequently the parameter estimates are refreshed on the screen during iterations; only applies if debugmode = TRUE in estoptions: see runMLwiN.
fixM: Specifies the fixed effect method: 1 for Gibbs Sampling, 2 for univariate MH Sampling and 3 for multivariate MH Sampling.
residM: Specifies the residual method: 1 for Gibbs Sampling, 2 for univariate MH Sampling and 3 for multivariate MH Sampling.
Lev1VarM: Specifies the level 1 variance method: 1 for Gibbs Sampling, 2 for univariate MH Sampling and 3 for multivariate MH Sampling.
OtherVarM: Specifies the variance method for other levels: 1 for Gibbs Sampling, 2 for univariate MH Sampling and 3 for multivariate MH Sampling.
adaption: adaption = 1 indicates adaptation is to be used; 0 otherwise.
priorcode: A vector indicating which default priors are to be used for the variance parameters. It defaults to c(gamma = 1) in which case Gamma priors are used with MLwiN's defaults of Gamma a value (shape) = 0.001 and Gamma b value (scale) = 0.001, although alternative values for shape and scale can be specified in subsequent elements of the vector, e.g. c(gamma = 1, shape = 0.5, scale = 0.2)). Alternatively c(uniform = 1) specifies Uniform priors on the variance scale. To allow for back-compatibility with deprecated syntax used in versions of R2MLwiN prior to 0.8-2, if priorcode is instead specified as an integer, then 1 indicates that Gamma priors are used, whereas 0 indicates that Uniform priors are used. See the section on 'Priors' in the MLwiN help system for more details on the meaning of these priors.
rate: An integer specifying the acceptance rate (as a percentage); this command is ignored if adaption = 0.
tol: An integer specifying tolerance (as a percentage) for the acceptance rate.
lclo: This command toggles on/off the possible forms of complex level 1 variation when using MCMC. lclo = 0 expresses the level 1 variation as a function of the predictors, whereas lclo = 1 expresses the log of the level 1 precision (1/variance) as a function of the predictors.
mcmcOptions: A list of other MCMC options used. See `Details' below.
fact: A list of objects specified for factor analysis. See `Details' below.
xc: Indicates whether model is cross-classified (TRUE) or nested (FALSE). xc = NULL by default (corresponding to FALSE), unless either mm or car are not null, in which case xc = TRUE.
mm: Specifies the structure of a multiple membership model. Can be a list of variable names, a list of vectors, or a matrix (e.g. see df2matrix). In the case of the former, each element of the list corresponds to a level (classification) of the model, in descending order. If a level is not a multiple membership classification, then NA is specified. Otherwise, lists need to be assigned to mmvar and weights, with the former containing columns specifying the classification units, and the latter containing columns specifying the weights. Ignored if EstM = 0, i.e. only applicable to models estimated via MCMC. mm = NULL by default. Supersedes deprecated xclass. E.g. (from demo(MCMCGuide16)) for logearn ~ 1 + age_40 + sex + parttime + (company|1) + (id|1), if company is a multiple membership classification with the variables indicating the classifications in company, company2, company3, company4 and their weights in weight1, weight2, weight3 and weight4 then mm = list(list(mmvar = list('company', 'company2', 'company3', 'company4'), weights = list('weight1', 'weight2', 'weight3', 'weight4')), NA) with the NA, listed last, corresponding to the level 1 identifier (id).
car: A list specifying structure of a conditional autoregressive (CAR) model. Each element of the list corresponds to a level (classification) of the model, in descending order. If a level is not a spatial classification, then NA is specified. Otherwise, lists need to be assigned to carvar and weights, with the former containing columns specifying the spatial classification units, and the latter containing columns specifying the weights. See demo(MCMCGuide17) for examples. car = NULL by default.
BUGO: If non-NULL uses BUGS for MCMC estimation using files specified in modelfile, initfile and datafile.
mem.init: A vector which sets and displays worksheet capacities for the current MLwiN session according to the value(s) specified. By default, the number of levels is nlev+1; worksheet size in thousands of cells is 6000; the number of columns is 2500; the number of explanatory variables is num_vars+10; the number of group labels is 20. nlev is the number of levels specified by levID, and num_vars is approximately the number of explanatory variables calculated initially.
optimat: This option instructs MLwiN to limit the maximum matrix size that can be allocated by the (R)IGLS algorithm. Specify optimat = TRUE if MLwiN gives the following error message 'Overflow allocating smatrix'. This error message arises if one more higher-level units is extremely large (contains more than 800 lower-level units). In this situation runmlwin's default behaviour is to instruct MLwiN to allocate a larger matrix size to the (R)IGLS algorithm than is currently possible. Specifying optimat = TRUE caps the maximum matrix size at 800 lower-level units, circumventing the MLwiN error message, and allowing most MLwiN functionality.
modelfile: A file name where the BUGS model will be saved in .txt format.
initfile: A file name where the BUGS initial values will be saved in .txt format.
datafile: A file name where the BUGS data will be saved in .txt format.
macrofile: A file name where the MLwiN macro file will be saved.
IGLSfile: A file name where the IGLS estimates will be saved.
MCMCfile: A file name where the MCMC estimates will be saved.
chainfile: A file name where the MCMC chains will be saved.
MIfile: A file name where the missing values will be saved.
resifile: A file name where the residual estimates will be saved.
resi.store: A logical value to indicate if residuals are to be stored (TRUE) or not (FALSE).
resioptions: A string vector to specify the various residual options. The 'variance' option calculates the posterior variances instead of the posterior standard errors; the 'standardised' option calculates standardised residuals.
resichains: A file name where the residual chains will be saved.
FACTchainfile: A file name where the factor chains will be saved.
resi.store.levs: An integer vector indicating the levels at which the residual chains are to be stored.
debugmode: A logical value determining whether MLwiN is run in the background or not. The default value is FALSE: i.e. MLwiN is run in the background. If TRUE the MLwiN GUI is opened, and then pauses after the model has been set-up, allowing user to check starting values; pressing 'Resume macro' will then fit the model. Once fit, pressing 'Resume macro' once more will save the outputs to the workdir ready to be read by R2MLwiN. Users can instead opt to 'Abort macro' in which case the outputs are not saved to the workdir. This option currently works for 32 bit version of MLwiN only (automatically switches unless MLwiNPath or options(MLwiNPath) has been set directly to the executable).
startval: A list of numeric vectors specifying the starting values when using MCMC. FP.b corresponds to the estimates for the fixed part; FP.v specifies the variance/covariance estimates for the fixed part; RP.b specifies the variance estimates for the random part; RP.v corresponds to the variance/covariance matrix of the variance estimates for the random part.
dami: This command outputs a complete (i.e. including non-missing responses) response variable y. If dami = c(0, <iter1>, <iter2>,...) then the response variables returned will be the value of y at the iterations quoted (as integers <iter1>, <iter2>, etc.); these can be used for multiple imputation. If dami = 1 the value of y will be the mean estimate from the iterations produced. dami = 2 is as for dami = 1 but with the standard errors of the estimate additionally being stored.
namemap: A mapping of column names to DTA friendly shorter names
saveworksheet: A list of file names (one for each chain) used to store the MLwiN worksheet after the model has been estimated.
rng.version: An integer indicating which random number generator should be used by MLwiN.

Author

Zhang, Z., Charlton, C.M.J., Parker, R.M.A., Leckie, G., and Browne, W.J. (2016) Centre for Multilevel Modelling, University of Bristol.

Details

A list of other MCMC options as used in the argument mcmcOptions:

orth: If orth = 1, orthogonal fixed effect vectors are used; zero otherwise.
hcen: An integer specifying the level where we use hierarchical centering.
smcm: If smcm = 1, structured MCMC is used; zero otherwise.
smvn: If smvn = 1, the structured MVN framework is used; zero otherwise.
paex: A matrix of Nx2; in each row, if the second digit is 1, parameter expansion is used at level <the first digit>.

mcco: This command allows the user to have constrained settings for the lowest level variance matrix in a multivariate Normal model. If value is 0, it estimates distinct variances for each residual error and distinct covariances for each residual error pair. Four other settings are currently available:

`1`	fits stuctured errors with a common correlation paramater and a common variance parameter;
`2`	fits AR1 errors with a common variance parameter;
`3`	fits structured errors with a common correlation parameter and independent variance parameters;
`4`	fits AR1 errors with independent variance parameters.

A list of objects specified for cross-classified and/or multiple membership models, as used in the argument xclass:

class: An integer (vector) of the specified class(es).
N1: This defines a multiple membership across N1 units at level class. N1>1 if there is multiple membership.
weight: If there is multiple membership then the column number weight, which is the length of the dataset, will contain the first set of weights for the multiple membership. Note that there should be N1 weight columns and they should be sequential in the worksheet starting from weight.
id: If the response is multivariate then the column number id must be input and this contains the first set of identifiers for the classification. Note that for a p-variate model each lowest level unit contains p records and the identifiers (sequence numbers) for each response variate need to be extracted into id and following columns. There should be N1 of these identifier columns and they should be sequential starting from id in the multivariate case.
car: car = TRUE indicates the spatial CAR model; FALSE otherwise. car = FALSE if ignored.

A list of objects specified for factor analysis, as used in the argument fact:

nfact: Specifies the number of factors
lev.fact: Specifies the level/classification for the random part of the factor for each factor.
nfactcor: Specifies the number of correlated factors
factcor: a vector specifying the correlated factors: the first element corresponds to the first factor number, the second to the second factor number, the third element corresponds to the starting value for the covariance and the fourth element to whether this covariance is constrained (1) or not (0). If more than one pair of factors is correlated, then repeat this sequence for each pair.
loading: A matrix specifying the starting values for the factor loadings and the starting value of the factor variance. Each row corresponds to a factor.
constr: A matrix specifying indicators of whether the factor loadings and the factor variance are constrained (1) or not (0).

References

Goldstein, H. (2011) Multilevel Statistical Models. 4th Edition. London: John Wiley and Sons.

Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2012) A User's Guide to MLwiN Version 2.26. Centre for Multilevel Modelling, University of Bristol.