maboost(x,...)
"maboost"(x, y,test.x=NULL,test.y=NULL,breg=c("entrop","l2")
,type=c("normal","maxmargin","smooth","sparse"),C50tree=FALSE,iter=100, nu=1
,bag.frac=0.5,random.feature=TRUE,random.cost=TRUE,smoothfactor=1
,sparsefactor=FALSE,verbose=FALSE,...,na.action=na.rpart)
"maboost"(formula, data, ..., subset, na.action=na.rpart)
breg="l2"
(default) selects quadratic Bregman divergence and breg="entrop"
uses KL-divergence which results in a adaboost-like algorithm (with a different choice of eta).type="maxmargin"
: it guarantees that the margin of the final hypothesis converges to max-margin (at each round t, it divides eta by t^.5). type="sparse"
: It uses SparseBoost and only works with breg="l2"
. It generates sparse weight vectors by projecting the weight vectors onto R+. It can be used for multiclass but it is kind of meaningless since the multiclass setting uses a weight matrix instead of weight vector and increasing the sparsity of this matrix does not result in the sparsity of the weight vector (which is the sum over col. of the weight matrix). type="smooth"
: flag to start smooth boosting. Only works for breg="l2"
and for binary classification. Note that for type="smooth"
, smoothfactor parameter should also be set, accordinglyCF
and minCases
parameters in C50Control properlyrpart.control
and C50Control
. For stumps, use maxdepth=1,cp=-1,minsplit=0,xval=0
.
maxdepth
controls the depth of trees, and cp
controls the complexity of trees. For C5.0 use CF,minCases
control the complexity and size of the tree. The smaller the CF
is, the less complex the tree and the larger the minCases
, the smaller the size of the C5.0 tree When using usage maboost(y~.): data must be in a data frame. Response can have factor or numeric values (preferably factor form). missing values can be present in the descriptor data, whenever na.action is set to any option other than na.pass. After the model is fit, maboost prints a summary of the function call, the method used for boosting, the number of iterations, the final confusion matrix (observed classification vs predicted classification; labels for classes are same as in response), the error for the training set, and testing, training , and kappa estimates of the appropriate number of iterations.
A summary of this information can also be obtained with the command print(x).
Corresponding functions (Use help with summary.maboost, predict.maboost, ... varplot.maboost for additional information on these commands):
summary : function to print a summary of the original function call, method used for boosting, number of iterations, final confusion matrix, accuracy, and kappa statistic (a measure of agreement between the observed classification and predicted classification). summary can be used for training, testing, or validation data.
predict : function to predict the response for any data set (train, test, or validation)
varplot.maboost : plot of variables ordered by the variable importance measure (based on improvement).
update : add more trees to the maboost
object.
[2] Culp, M., Johnson, K., Michailidis, G. (2006). maboost: an R Package for Stochastic Boosting Journal of Statistical Software, 16.
print.maboost
,summary.maboost
,predict.maboost
,update.maboost
,varplot.maboost
## fit maboost model
data(iris)
##drop setosa
iris[iris$Species!="setosa",]->iris
##set up testing and training data (60% for training)
n<-dim(iris)[1]
trind<-sample(1:n,floor(.6*n),FALSE)
teind<-setdiff(1:n,trind)
iris[,5]<- as.factor((levels(iris[,5])[2:3])[as.numeric(iris[,5])-1])
##fit a tree with maxdepth=6 (a variable pass to rpart.control).
gdis<-maboost(Species~.,data=iris[trind,],iter=50,nu=2
,breg="l2", type="sparse",bag.frac=1,random.feature=FALSE
,random.cost=FALSE, C50tree=FALSE, maxdepth=6,verbose=TRUE)
##to see the average zeros in the weighting vectors over the 40 rounds of boosting
print(mean(gdis$model$num_zero))
##prediction
pred.gdis= predict(gdis,iris,type="class");
##variable selection
varplot.maboost(gdis)
Run the code above in your browser using DataLab