The tsvs2()/tsvs()
function is for Thompson sampling
variable selection with NFT BART.
tsvs2(
## data
xftrain, xstrain, times, delta=NULL,
rm.const=TRUE, rm.dupe=TRUE,
##tsvs args
K=20, a.=1, b.=0.5, C=0.5,
rds.file='tsvs2.rds', pdf.file='tsvs2.pdf',
## multi-threading
tc=getOption("mc.cores", 1), ##OpenMP thread count
##MCMC
nskip=1000, ndpost=2000,
nadapt=1000, adaptevery=100,
chvf=NULL, chvs=NULL,
method="spearman", use="pairwise.complete.obs",
pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
minnumbot=c(5, 5),
## BART and HBART prior parameters
ntree=c(10, 2), numcut=100,
xifcuts=NULL, xiscuts=NULL,
power=c(2, 2), base=c(0.95, 0.95),
## f function
fmu=NA, k=5, tau=NA, dist='weibull',
## s function
total.lambda=NA, total.nu=10, mask=0.95,
## survival analysis
##K=100, events=NULL,
## DPM LIO
drawDPM=1L,
alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
neal.m=2, constrain=1,
m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
## misc
na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
transposed=FALSE
)tsvs(
## data
x.train, times, delta=NULL,
rm.const=TRUE, rm.dupe=TRUE,
##tsvs args
K=20, a.=1, b.=0.5, C=0.5,
rds.file='tsvs.rds', pdf.file='tsvs.pdf',
## multi-threading
tc=getOption("mc.cores", 1), ##OpenMP thread count
##MCMC
nskip=1000, ndpost=2000,
nadapt=1000, adaptevery=100,
chv=NULL,
method="spearman", use="pairwise.complete.obs",
pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
minnumbot=c(5, 5),
## BART and HBART prior parameters
ntree=c(10, 2), numcut=100, xicuts=NULL,
power=c(2, 2), base=c(0.95, 0.95),
## f function
fmu=NA, k=5, tau=NA, dist='weibull',
## s function
total.lambda=NA, total.nu=10, mask=0.95,
## survival analysis
##K=100, events=NULL,
## DPM LIO
drawDPM=1L,
alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
neal.m=2, constrain=1,
m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
## misc
na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
transposed=FALSE
)
n x pf matrix of predictor variables for the training data.
n x ps matrix of predictor variables for the training data.
n x ps matrix of predictor variables for the training data.
nx1 vector of the observed times for the training data.
nx1 vector of the time type for the training data: 0, for right-censoring; 1, for an event; and, 2, for left-censoring.
To remove constant variables or not.
To remove duplicate variables or not.
The number of Thompson sampling steps to take. Not to be confused with the size of the time grid for survival distribution estimation.
The prior parameter for successes of a Beta distribution.
The prior parameter for failures of a Beta distribution.
The probability cut-off for variable selection.
File name to store RDS object containing Thompson sampling parameters.
File name to store PDF graphic of variables selected.
Number of OpenMP threads to use.
Number of MCMC iterations to burn-in and discard.
Number of MCMC iterations kept after burn-in.
Number of MCMC iterations for adaptation prior to burn-in.
Adapt MCMC proposal distributions every adaptevery
iteration.
Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals.
Correlation options for change-of-variable proposal pre-conditioner.
Probability of performing a birth/death proposal, otherwise perform a rotate proposal.
Probability of performing a birth proposal given that we choose to perform a birth/death proposal.
Initial width of proposal distribution for peturbing cut-points.
Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal.
Minimum number of observations required in leaf (terminal) nodes.
Vector of length two for the number of trees used for the mean model and the number of trees used for the variance model.
Number of cutpoints to use for each predictor variable.
More detailed construction of cut-points can be specified
by the xicuts
function and provided here.
Power parameter in the tree depth penalizing prior.
Base parameter in the tree depth penalizing prior.
Prior parameter for the center of the mean model.
Prior parameter for the mean model.
Desired SD/ntree
for f function leaf prior if known.
Distribution to be passed to intercept-only AFT model to center y.train
.
A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior.
Shape parameter for the variance prior.
If a proportion is provided, then said quantile
of max.i sd(x.i)
is used to mask non-stationary
departures (with respect to convergence) above this threshold.
Whether to utilize DPM or not.
Initial value of DPM concentration parameter.
Gamma prior parameter setting for DPM concentration parameter
where E[alpha
]=alpha.a
/alpha.b
.
See alpha.a
above.
Whether to draw alpha
or it is fixed at the initial value.
The number of additional atoms for Neal 2000 DPM algorithm 8.
Whether to perform constained DPM or unconstrained.
Center of the error distribution: defaults to zero.
First Gamma prior argument for k0
.
Second Gamma prior argument for k0
.
Initial value of k0
.
Whether to fix k0 or draw it if from the DPM LIO prior
hierarchy: k0~Gamma(k0.a, k0.b)
, i.e., E[k0]=k0.a/k0.b
.
First Gamma prior argument for \(tau\).
First Gamma prior argument for b0
.
Second Gamma prior argument for b0
.
Initial value of b0
.
Whether to fix b0 or draw it from the DPM LIO prior
hierarchy: b0~Gamma(b0.a, b0.b)
, i.e.,
E[b0]=b0.a/b0.b
.
Value to be passed to the predict
function.
Value to be passed to the predict
function.
Outputs MCMC algorithm status every printevery iterations.
tsvs
handles all of the pre-processing
for x.train/x.test
(including
tranposing) computational efficiency.
Rodney Sparapani: rsparapa@mcw.edu
tsvs2()/tsvs()
is the function to perform variable selection.
The tsvs2()/tsvs()
function returns a fit object of S3 class type
list
as well as storing it in rds.file
for
sampling in progress.
Sparapani R., Logan B., Maiers M., Laud P., McCulloch R. (2023) Nonparametric Failure Time: Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees and Low Information Omnibus Dirichlet Process Mixtures Biometrics (ahead of print) <doi:10.1111/biom.13857>.
Liu Y., Rockova V. (2021) Variable selection via Thompson sampling. Journal of the American Statistical Association. Jun 29:1-8.
tsvs
##library(nftbart)
data(lung)
N=length(lung$status)
##lung$status: 1=censored, 2=dead
##delta: 0=censored, 1=dead
delta=lung$status-1
## this study reports time in days rather than weeks or months
times=lung$time
times=times/7 ## weeks
## matrix of covariates
x.train=cbind(lung[ , -(1:3)])
## lung$sex: Male=1 Female=2
# \donttest{
##vars=tsvs2(x.train, x.train, times, delta)
vars=tsvs2(x.train, x.train, times, delta, K=0) ## K=0 just returns 0
# }
Run the code above in your browser using DataLab