Fit a varying coefficient model to panel data. Assumes a compound symmetry error structure in which the residual errors for a given subject are equally correlated. This is equivalent to assuming that there is a normally distributed random effect per subject.
VCBART_cs(Y_train,subj_id_train, ni_train,X_train,
Z_cont_train = matrix(0, nrow = 1, ncol = 1),
Z_cat_train = matrix(0L, nrow = 1, ncol = 1),
X_test = matrix(0, nrow = 1, ncol = 1),
Z_cont_test = matrix(0, nrow = 1, ncol = 1),
Z_cat_test = matrix(0, nrow = 1, ncol = 1),
unif_cuts = rep(TRUE, times = ncol(Z_cont_train)),
cutpoints_list = NULL,
cat_levels_list = NULL,
edge_mat_list = NULL,
graph_split = rep(FALSE, times = ncol(Z_cat_train)),
sparse = TRUE,
rho = 0.9,
M = 50,
mu0 = NULL, tau = NULL, nu = NULL, lambda = NULL,
nd = 1000, burn = 1000, thin = 1,
save_samples = TRUE, save_trees = TRUE,
verbose = TRUE, print_every = floor( (nd*thin + burn)/10))A list containing
Mean of the training observations (needed by predict_VCBART)
Standard deviation of the training observations (needed by predict_VCBART)
Vector of means of columns of X_train, including the intercept (needed by predict_VCBART).
Vector of standard deviations of X_trian, including the intercept (needed by predict_VCBART).
Vector containing posterior mean of evaluations of regression function E[y|x,z] on training data.
Matrix with length(Y_train) rows and ncol(X_train)+1 columns containing the posterior mean of evaluations of each coefficient function evaluated on the training data. Each row corresponds to a training set observation and each colunn corresponds to a coefficient function. Note the first column is for the intercept function.
Matrix with nd rows and length(Y_train) columns. Each row corresponds to a posterior sample of the regression function E[y|x,z] and each column corresponds to a training set observation. Only returned if save_samples == TRUE.
Array of dimension with nd x length(Y_train) x ncol(X_train)+1 containing posterior samples of evaluations of the coefficient functions. The first dimension corresponds to posterior samples/MCMC iterations, the second dimension corresponds to individual training set observations, and the third dimension corresponds to coefficient functions. Only returned if save_samples == TRUE.
Vector containing posterior mean of evaluations of regression function E[y|x,z] on testing data.
Matrix with nrow(X_test) rows and ncol(X_testn)+1 columns containing the posterior mean of evaluations of each coefficient function evaluated on the training data. Each row corresponds to a training set observation and each colunn corresponds to a coefficient function. Note the first column is for the intercept function.
Matrix with nd rows and nrow(X_test) columns. Each row corresponds to a posterior sample of the regression function E[y|x,z] and each column corresponds to a testing set observation. Only returned if save_samples == TRUE.
Array of size nd x nrow(X_test) x ncol(X_test)+1 containing posterior samples of evaluations of the coefficient functions. The first dimension corresponds to posterior samples/MCMC iterations, the second dimension corresponds to individual training set observations, and the third dimension corresponds to coefficient functions. Only returned if save_samples == TRUE.
Vector containing ALL samples of the residual standard deviation, including warmup.
Vector containing ALL samples of the auto-correlation parameter rho, including warmup.
Array of size nd x R x ncol(X)+1 that counts the number of times a variable was used in a decision rule in each posterior sample of each ensemble. Here R is the total number of potential modifiers (i.e. R = ncol(Z_cont_train) + ncol(Z_cat_train)).
If sparse=TRUE, an array of size nd x R ncol(X)+1 containing samples of the variable splitting probabilities.
A list (of length nd) of lists (of length ncol(X_train)+1) of character vectors (of length M) containing textual representations of the regression trees. The string for the s-th sample of the m-th tree in the j-th ensemble is contaiend in trees[[s]][[j]][m]. These strings are parsed by predict_VCBART to reconstruct the C++ representations of the sampled trees.
Vector of continous responses for training data
Vector containing the number of observations per subject in the training data.
Vector of length length(Y_train) that records which subject contributed each observation. Subjects should be numbered sequentially from 1 to length(ni_train).
Matrix of covariates for training observations. Do not include intercept as the first column.
Matrix of continuous modifiers for training data. Note, modifiers must be rescaled to lie in the interval [-1,1]. Default is a 1x1 matrix, which signals that there are no continuous modifiers in the training data.
Integer matrix of categorical modifiers for training data. Note categorical levels should be 0-indexed. That is, if a categorical modifier has 10 levels, the values should run from 0 to 9. Default is a 1x1 matrix, which signals that there are no categorical modifiers in the training data.
Matrix of covariate for testing observations. Default is a 1x1 matrix, which signals that testing data is not provided.
Matrix of continuous modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.
Integer matrix of categorical modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.
Vector of logical values indicating whether cutpoints for each continuous modifier should be drawn from a continuous uniform distribution (TRUE) or a discrete set (FALSE) specified in cutpoints_list. Default is TRUE for each variable in Z_cont_train
List of length ncol(Z_cont_train) containing a vector of cutpoints for each continuous modifier. By default, this is set to NULL so that cutpoints are drawn uniformly from a continuous distribution.
List of length ncol(Z_cat_train) containing a vector of levels for each categorical modifier. If the j-th categorical modifier contains L levels, cat_levels_list[[j]] should be the vector 0:(L-1). Default is NULL, which corresponds to the case that no categorical modifiers are available.
List of adjacency matrices if any of the categorical modifiers are network-structured. Default is NULL, which corresponds to the case that there are no network-structured categorical modifiers.
Vector of logicals indicating whether each categorical modifier is network-structured. Default is rep(FALSE, times = ncol(Z_cat_train)).
Logical, indicating whether or not to perform variable selection in each tree ensemble based on a sparse Dirichlet prior rather than uniform prior; see Linero 2018. Default is TRUE
Initial auto-correlation parameter for compound symmetry error structure. Must be between 0 and 1. Default is 0.9.
Number of trees in each ensemble. Default is 50.
Prior mean for jumps/leaf parameters. Default is 0 for each beta function. If supplied, must be a vector of length 1 + ncol(X_train).
Prior standard deviation for jumps/leaf parameters. Default is 1/sqrt(M) for each beta function. If supplied, must be a vector of length 1 + ncol(X_train).
Degrees of freedom for scaled-inverse chi-square prior on sigma^2. Default is 3.
Scale hyperparameter for scaled-inverse chi-square prior on sigma^2. Default places 90% prior probability that sigma is less than sd(Y_train).
Number of posterior draws to return. Default is 1000.
Number of MCMC iterations to be treated as "warmup" or "burn-in". Default is 1000.
Number of post-warmup MCMC iteration by which to thin. Default is 1.
Logical, indicating whether to return all posterior samples. Default is TRUE. If FALSE, only posterior mean is returned.
Logical, indicating whether or not to save a text-based representation of the tree samples. This representation can be passed to predict_flexBART to make predictions at a later time. Default is FALSE.
Logical, inciating whether to print progress to R console. Default is TRUE.
As the MCMC runs, a message is printed every print_every iterations. Default is floor( (nd*thin + burn)/10) so that only 10 messages are printed.
Given \(p\) covariates \(X_{1}, \ldots, X_{p}\) and \(r\) effect modifiers \(Z_{1}, \ldots, Z_{r}\), the varying coefficient model asserts that
\(E[Y \vert X = x, Z = ] = \beta_0(z) + \beta_1(z) * x_1 + ... \beta_p(z) * X_p.\)
That is, for any r-vector \(Z\), the relationships between \(X\) and \(Y\) is linear. However, the specific relationship is allowed to vary with respect tp \(Z\). VCBART approximates the covariate effect functions \(\beta_0(Z), \ldots, \beta_p(Z)\) using ensembles of regression trees. This function assumes that the within-subject errors are equi-correlated (i.e. a compound symmetry error structure).