train_predict_mix
predicts the binary response based on
high dimemsional binary features modeled with Bayesian mixture
models. The model is trained with Gibbs sampling. A smaller number
of features can be selected based on the correlations with the
response. The bias due to the selection procedure can be corrected.
The software is written entirely with R language. train_predict_mix( test,train,k, theta0=0,alpha.shape=0.5,alpha.rate=5,no.alpha=30,
common.alpha=FALSE,no.alpha0=100, mc.iters=200,iters.labeltheta=10,
iters.theta=20,width.theta=0.1, correction=TRUE,no.theta.adj=30,approxim=TRUE, pred.start=100)
test
theta0
,
1-theta0
) common.alpha=FALSE
.common.alpha=FALSE
. Otherwise ``alpha''
and ``alpha0'' are the same.iters.labeltheta
times, and then ``alpha'' and
``alpha0'' are updated once.width.theta
).no.theta.adj
)+1
points. approxim=TRUE
pred.start
will be used to
make Monte Carlo estimationlabel
is equal to the number of
training cases.I1
, but for those cases labeled by ``2''.http://math.usask.ca/~longhai/publication.html
gendata.mix
#simulating data set from a Bayesian mixture model
data <- gendata.mix(20,20,50,50,101,10,c(0.9,0.1))
#training the model using Gibbs sampling, without correcting for the feature
#selection bias, then testing on predicting the responses of the test cases,
predict.uncor <- train_predict_mix(
test=data$test,train=data$train,k=5,
theta0=0,alpha.shape=0.5,alpha.rate=5,no.alpha=5,
common.alpha=FALSE,no.alpha0=100,
mc.iters=30,iters.labeltheta=1,
iters.theta=10,width.theta=0.1,
correction=FALSE,no.theta.adj=5,approxim=TRUE,
pred.start=10)
#As above, but with the feature selection bias corrected
predict.cor <- train_predict_mix(
test=data$test,train=data$train,k=5,
theta0=0,alpha.shape=0.5,alpha.rate=5,no.alpha=5,
common.alpha=FALSE,no.alpha0=100,
mc.iters=30,iters.labeltheta=1,
iters.theta=10,width.theta=0.1,
correction=TRUE,no.theta.adj=5,approxim=TRUE,
pred.start=10)
Run the code above in your browser using DataLab