LBoost: LBoost

Description

Constructs an ensemble of logic regression models using boosting for classification and identification of important predictors and predictor interactions

Usage

LBoost(resp, Xs, anneal.params, nBS = 100, kfold = 5, nperm = 1, 
PI.imp = NULL, pred.imp = FALSE)

Arguments

resp

numeric vector of binary response values.

matrix or data frame of zeros and ones for all predictor variables.

anneal.params

a list containing the parameters for simulated annealing. See the help file for the function logreg.anneal.control in the LogicReg package. If missing, default annealing parameters are set at start=1, end=-2, and iter=50000.

nBS

number of logic regression trees to be fit in the LBoost model.

kfold

The number of times the data are to be split in constructing the ensemble.

nperm

If measuring predictor importance of interaction importance using the permutation based measure, nperm is the number of permutations to be done in determining predictor of interaction importance.

PI.imp

A character string describing which measure of interaction importance will be used. Possible values include "Permutation", "AddRemove", and "Both". Using "Permutation" will provide the permutation based measure of interaction importance, "AddRemove" will provide the add-in/leave-out based measure of interaction importance, and "Both" provides both measures of importance.

pred.imp

logical. If FALSE, predictor importance scores will not be measured.

Value

CVmod: A list of all logic regression fits and the associated information in the LBoost model. Each item in the list also gives a list of LR fits for a specific kfold data set, a matrix of weights given to each LR fit for that kfold data set, a matrix of the kfold training data used to construct the list of fits.
CVmisclass: a list including the mean cross-validation misclassification rate for the models and a list of vectors giving the predictions for each of the kfold test data sets.
AddRemove.PIimport: If PI.imp is specified as either "AddRemove" or "Both, this is a vector of add-in/leave-out importance scores for all interactions that occur in the LBoost model. If PI.imp is not specified or is "Permutation", this will state "Not measured".
Perm.PIimport: If PI.imp is specified as either "Permutation" or "Both, this is a vector of add-in/leave-out importance scores for all interactions that occur in the LBoost model. If PI.imp is not specified or is "AddRemove", this will state "Not measured".
Pred.import: If pred.imp is specified as TRUE, a vector of importance scores for all predictors in the data.
Pred.freq: a vector frequency of predictors occurring in individual logic regression in the LBoost model.
PI.frequency: a vector frequency of interactions occurring in individual logic regression in the LBoost model.
wt.mat: a list containing kfold matrices of observation weights for each tree for the kfold training data sets.
alphas: a list containing kfold vectors of tree specific weights for trees constructed from each of the kfold training data sets.
data: A matrix of the original data used to construct the LBoost model.
PIimp: A character string describing which interaction importance measure was used.
PredImp: logical. If TRUE predictor importance was measured.

References

Wolf, B.J., Hill, E.G., Slate, E.H., Neumann, C.A., Kistner-Griffin, E. (2012). LBoost: A boosting algorithm with applications for epistasis discovery. PLoS One.

Examples

Run this code

data(LF.data)

#Set using annealing parameters using the logreg.anneal.control 
#function from LogicReg package
newanneal<-logreg.anneal.control(start=1, end=-2, iter=2000)

#typically more than 2000 iterations (>25000) would be used for 
#the annealing algorithm.  A typical LBoost models also contains at 
#least 100 trees.  These parameters were set to allow for faster
#run time

#The data set LF.data contains 50 binary predictors and a binary response Ybin
#Looking at only the Permutation Measure
LBfit.1<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, nperm=2, PI.imp="Permutation")
print(LBfit.1)

#Looking at only the Add-in/Leave-out importance measure
LBfit.2<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, PI.imp="AddRemove")
print(LBfit.2)

#Looking at both measures of importance plus predictor importance
LBfit.3<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, nperm=2, PI.imp="Both", pred.imp=TRUE)
print(LBfit.3)

Run the code above in your browser using DataLab