bayesx.control: Control Parameters for BayesX

Description

Various parameters that control fitting of regression models using bayesx.

Usage

bayesx.control(model.name = "bayesx.estim",  family = "gaussian", method = "MCMC", verbose = FALSE,  dir.rm = TRUE, outfile = NULL, replace = FALSE, iterations = 12000L, burnin = 2000L, maxint = NULL, step = 10L, predict = TRUE, seed = NULL, hyp.prior = NULL, distopt = NULL, reference = NULL, zipdistopt = NULL, begin = NULL, level = NULL, eps = 1e-05, lowerlim = 0.001, maxit = 400L, maxchange = 1e+06, leftint = NULL, lefttrunc = NULL, state = NULL, algorithm = NULL, criterion = NULL,  proportion = NULL, startmodel = NULL, trace = NULL,  steps = NULL, CI = NULL, bootstrapsamples = NULL, ...)

Arguments

model.name

character, specify a base name model output files are named with in outfile.

family

character, specify the distribution used for the model, options for all methods, "MCMC", "REML" and "STEP" are: "binomial", "binomialprobit", "gamma", "gaussian", "multinomial", "poisson". For "MCMC" and "REML" only: "cox", "cumprobit" and "multistate". For "REML" only use: "binomialcomploglog", "cumlogit", "multinomialcatsp", "multinomialprobit", "seqlogit", "seqprobit".

method

character, which method should be used for estimation, options are "MCMC", "HMCMC" (hierarchical MCMC), "REML" and "STEP".

verbose

logical, should output be printed to the R console during runtime of bayesx.

dir.rm

logical, should the the output files and directory removed after estimation?

outfile

character, specify a directory where bayesx should store all output files, all output files will be named with model.name as the base name.

replace

if set to TRUE, the files in the output directory specified in argument outfile will be replaced.

iterations

integer, sets the number of iterations for the sampler.

burnin

integer, sets the burn-in period of the sampler.

maxint

integer, if first or second order random walk priors are specified, in some cases the data will be slightly grouped: The range between the minimal and maximal observed covariate values will be divided into (small) intervals, and for each interval one parameter will be estimated. The grouping has almost no effect on estimation results as long as the number of intervals is large enough. With the maxint option the amount of grouping can be determined by the user. integer is the maximum number of intervals allowed. for equidistant data, the default maxint = 150 for example, means that no grouping will be done as long as the number of different observations is equal to or below 150. for non equidistant data some grouping may be done even if the number of different observations is below 150.

step

integer, defines the thinning parameter for MCMC simulation. E.g., step = 50 means, that only every 50th sampled parameter will be stored and used to compute characteristics of the posterior distribution as means, standard deviations or quantiles. The aim of thinning is to reach a considerable reduction of disk storing and autocorrelations between sampled parameters.

predict

logical, option predict may be specified to compute samples of the deviance D, the effective number of parameters pD and the deviance information criterion DIC of the model. In addition, if predict = FALSE, only output files of estimated effects will be returned, otherwise an expanded dataset using all observations would be written in the output directory, also containing the data used for estimation. Hence, this option is useful when dealing with large data sets, that might cause memory problems if predict is set to TRUE.

seed

integer, set the seed of the random number generator in BayesX, usually set using function set.seed.

hyp.prior

numeric, defines the value of the hyper-parameters a and b for the inverse gamma prior of the overall variance parameter $\sigma^2$, if the response distribution is Gaussian. numeric, must be a positive real valued number. The default is hyp.prior = c(1, 0.005).

distopt

character, defines the implemented formulation for the negative binomial model if the response distribution is negative binomial. The two possibilities are to work with a negative binomial likelihood (distopt = "nb") or to work with the Poisson likelihood and the multiplicative random effects (distopt = "poga").

reference

character, option reference is meaningful only if either family = "multinomial" or family = "multinomialprobit" is specified as the response distribution. In this case reference defines the reference category to be chosen. Suppose, for instance, that the response is three categorical with categories 1, 2 and 3. Then reference = 2 defines the value 2 to be the reference category.

zipdistopt

character, defines the zero inflated distribution for the regression analysis. The two possibilities are to work with a zero infated Poisson distribution (zipdistopt = "zip") or to work with the zero inflated negative binomial likelihood (zipdistopt = "zinb").

begin

character, option begin is meaningful only if family = "cox" is specified as the response distribution. In this case begin specifies the variable that records when the observation became at risk. This option can be used to handle left truncation and time-varying covariates. If begin is not specified, all observations are assumed to have become at risk at time 0.

level

integer, besides the posterior means and medians, BayesX provides point-wise posterior credible intervals for every effect in the model. In a Bayesian approach based on MCMC simulation techniques credible intervals are estimated by computing the respective quantiles of the sampled effects. By default, BayesX computes (point-wise) credible intervals for nominal levels of 80$\%$ and 95$\%$. The option level[1] allows to redefine one of the nominal levels (95$\%$). Adding, for instance, level[1] = 99 to the options list computes credible intervals for a nominal level of 99$\%$ rather than 95$\%$. Similar to argument level[1] the option level[2] allows to redefine one of the nominal levels (80$\%$). Adding, for instance, level[2] = 70 to the options list computes credible intervals for a nominal level of 70$\%$ rather than 80$\%$.

eps

numeric, defines the termination criterion of the estimation process. If both the relative changes in the regression coefficients and the variance parameters are less than eps, the estimation process is assumed to have converged.

lowerlim

numeric, since small variances are close to the boundary of their parameter space, the usual fisher-scoring algorithm for their determination has to be modified. If the fraction of the penalized part of an effect relative to the total effect is less than lowerlim, the estimation of the corresponding variance is stopped and the estimator is defined to be the current value of the variance (see section 6.2 of the BayesX methodology manual for details).

maxit

integer, defines the maximum number of iterations to be used in estimation. Since the estimation process will not necessarily converge, it may be useful to define an upper bound for the number of iterations. Note, that BayesX returns results based on the current values of all parameters even if no convergence could be achieved within maxit iterations, but a warning message will be printed in the output window.

maxchange

numeric, defines the maximum value that is allowed for relative changes in parameters in one iteration to prevent the program from crashing because of numerical problems. Note, that BayesX produces results based on the current values of all parameters even if the estimation procedure is stopped due to numerical problems, but an error message will be printed in the output window.

leftint

character, gives the name of the variable that contains the lower (left) boundary $T_{lo}$ of the interval $[T_{lo}, T_{up}]$ for an interval censored observation. for right censored or uncensored observations we have to specify $T_{lo} = T_{up}$ . If leftint is missing, all observations are assumed to be right censored or uncensored, depending on the corresponding value of the censoring indicator.

lefttrunc

character, option lefttrunc specifies the name of the variable containing the left truncation time $T_{tr}$. For observations that are not truncated, we have to specify $T_{tr} = 0$. If lefttrunc is missing, all observations are assumed to be not truncated. for multi-state models variable lefttrunc specifies the left endpoint of the corresponding time interval.

state

character, for multi-state models, state specifies the current state variable of the process.

algorithm

character, specifies the selection algorithm. Possible values are "cdescent1" (adaptive algorithms in the methodology manual, see subsection 6.3), "cdescent2" (adaptive algorithms 1 and 2 with backfitting, see remarks 1 and 2 of section 3 in Belitz and Lang (2008)), "cdescent3" (search according to cdescent1 followed by cdescent2 using the selected model in the first step as the start model) and "stepwise" (stepwise algorithm implemented in the gam routine of S-plus, see Chambers and Hastie, 1992). This option will rarely be specified by the user.

criterion

character, specifies the goodness of fit criterion. If criterion = "MSEP" is specified the data are randomly divided into a test- and validation data set. The test data set is used to estimate the models and the validation data set is used to estimate the mean squared prediction error (MSEP) which serves as the goodness of fit criterion to compare different models. The proportion of data used for the test and validation sample can be specified using option proportion, see below. The default is to use 75% of the data for the training sample.

proportion

numeric, this option may be used in combination with option criterion = "MSEP", see above. In this case the data are randomly divided into a training and a validation sample. proportion defines the fraction (between 0 and 1) of the original data used as training sample.

startmodel

character, defines the start model for variable selection. Options are "linear", "empty", "full" and "userdefined".

trace

character, specifies how detailed the output in the output window will be. Options are "trace_on", "trace_half" and "trace_off".

steps

integer, defines the maximum number of iterations. If the selection process has not converged after steps iterations the algorithm terminates and a warning is raised. Setting steps = 0 allows the user to estimate a certain model without any model choice. This option will rarely be specified by the user.

character, compute confidence intervals for linear and nonlinear terms. Option CI allows to compute confidence intervals. Options are CI = "none", confidence intervals conditional on the selected model CI = "MCMCselect" and unconditional confidence intervals where model uncertainty is taken into account CI = "MCMCbootstrap". Both alternatives are computer intensive. Conditional confidence intervals take much less computing time than unconditional intervals. The advantage of unconditional confidence intervals is that sampling distributions for the degrees of freedom or smoothing parameters are obtained.

bootstrapsamples

integer, defines the number of bootstrap samples used for "CI = MCMCbootstrap".

...

not used

Value

A list with the arguments specified is returned.

References

For methodological and reference details see the BayesX manuals available at: http://www.BayesX.org.

Belitz C, Lang S (2008). Simultaneous selection of variables and smoothing parameters in structured additive regression models. Computational Statistics \& Data Analysis, 53, 61--81.

Chambers JM, Hastie TJ (eds.) (1992). Statistical Models in S. Chapman \& Hall, London.

Umlauf N, Adler D, Kneib T, Lang S, Zeileis A (2015). Structured Additive Regression Models: An R Interface to BayesX. Journal of Statistical Software, 63(21), 1--46. http://www.jstatsoft.org/v63/i21/

Examples

Run this code

bayesx.control()

## Not run: 
# set.seed(111)
# n <- 500
# ## regressors
# dat <- data.frame(x = runif(n, -3, 3))
# ## response
# dat$y <- with(dat, 10 + sin(x) + rnorm(n, sd = 0.6))
# 
# ## estimate models with
# ## bayesx MCMC and REML
# b1 <- bayesx(y ~ sx(x), method = "MCMC", data = dat)
# b2 <- bayesx(y ~ sx(x), method = "REML", data = dat)
# 
# ## compare reported output
# summary(b1)
# summary(b2)
# ## End(Not run)