iBMA: Iterated Bayesian Model Averaging variable selection for generalized linear models, linear models or survival models.

Description

This function implements the iterated Bayesian Model Averaging method for variable selection. This method works by making repeated calls to a Bayesian model averaging procedure, iterating through the variables in a fixed order. After each call to the Bayesian model averaging procedure only those variables which have posterior probability greater than a specified threshold are retained, those variables whose posterior probabilities do not meet the threshold are replaced with the next set of variables. The order in which the variables are to be considered is usually determined on the basis of the some measure of goodness of fit calculated univariately for each variable.

Usage

iBMA.glm(X, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, 
         nIter = 100, verbose = FALSE, sorted = FALSE, 
         factor.type = TRUE, ...)

iBMA.glm(current_state, nIter = NULL, verbose = NULL, ...) 

iBMA.bicreg(X, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, 
            nIter = 100, verbose = FALSE, sorted = FALSE, ...)
iBMA.bicreg(current_state, nIter = NULL, verbose = NULL, ...) 

iBMA.surv(X, surv.t, cens, wt = rep(1, nrow(X)), thresProbne0 = 5, 
          maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, 
          factor.type = TRUE, ...)
iBMA.surv(current_state, nIter = NULL, verbose = NULL, ...)

Arguments

Value

An object of either type iBMA.X, or of type iBMA.X.intermediate, where 'X' is either 'glm', 'bicreg' or 'surv'. Objects of type 'iBMA.X.intermediate' consist of a list with components for each parameter passed into iBMA.X as well as the following components:
sortedXa matrix or data.frame containing the sorted variables.
callthe matched call.
initial.orderthe inital ordering of the variables.
nVarthe number of variables.
currentSeta vector specifying the set of variables currently selected.
nextVarthe next variable to be examined
current.probne0the posterior probabilities for inclusion for each of the variables in the current set of variables.
maxProbne0the maximum posterior probability calculated for each variable
nTimesthe number of times each variable has been included in the set of selected variables
currIterthe current iteration number
new.varsthe set of variables that will be added to the current set during the next iteration
first.in.modela vector of numbers giving the iteration number that each variable was first examined in. A value of NA indicates that a variable has not yet been examined.
iter.droppeda vector giving the iteration number in which each variable was dropped from the current set. A value of NA indicates that a variable has not yet been dropeed.
nIterationsthe total number of iterations that were run
selectedthe set of variables that were selected (in terms of the initial ordering of the variables)
bmaan object of type 'bic.X' containing the results of the Bayesian model averaging run on the selected set of variables.

synopsis

iBMA.glm(x, ...) iBMA.bicreg(x, ...) iBMA.surv(x, ...)

Details

These methods can be run in a 'batch' mode by setting nIter to be larger than the number of variables. Alternatively, if nIter is set to be small, the procedure may return before all of the variables have been examined. In this case the returned result of the call will be of class 'iBMA.X.intermediate', and if iBMA.X is called with this result as the input, nIter more iterations will be run. If on any iteration there are no variables that have posterior probability less than the threshold, the variable with the lowest posterior probability is dropped.

References

Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005). ` Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data.' Bioinformatics, 21(10), 2394-2402

Examples

Run this code

############ iBMA.glm
library("MASS")
data(birthwt)
 y<- birthwt$lo
 x<- data.frame(birthwt[,-1])
 x$race<- as.factor(x$race)
 x$ht<- (x$ht>=1)+0
 x<- x[,-9]
 x$smoke <- as.factor(x$smoke)
 x$ptl<- as.factor(x$ptl)
 x$ht <- as.factor(x$ht)
 x$ui <- as.factor(x$ui)

### add 41 columns of noise
noise<- matrix(rnorm(41*nrow(x)), ncol=41)
colnames(noise)<- paste('noise', 1:41, sep='')
x<- cbind(x, noise)

iBMA.glm.out<- iBMA.glm(x, y,  glm.family="binomial", factor.type=FALSE, 
                        verbose = TRUE, thresProbne0 = 5 )
summary(iBMA.glm.out)




################## iBMA.surv
library(survival)
data(veteran)

surv.t<- veteran$time
cens<- veteran$status
veteran$time<- NULL
veteran$status<- NULL
lvet<- nrow(veteran)
invlogit<- function(x) exp(x)/(1+exp(x))
# generate random noise, 34 uniform variables and 10 factors each with 4 levels
X<- data.frame(matrix(runif(lvet*34), ncol=34), 
               matrix(c("a","b","c","d","e","f")[(rbinom(10*lvet, 3, .5))+1], 
               ncol = 10))
colnames(X)<- c(paste("u",1:34, sep=""),paste("C",1:10, sep=""))
veteran_plus_noise<- cbind(veteran, X)


test.iBMA.surv<- iBMA.surv(x = veteran_plus_noise, surv.t = surv.t, cens = cens, 
                           thresProbne0 = 5, maxNvar = 30, factor.type = TRUE, 
                           verbose = TRUE, nIter = 100)

test.iBMA.surv
summary(test.iBMA.surv)


############ iBMA.bicreg ... degenerate example
library(MASS)
data(UScrime)
UScrime$M<- log(UScrime$M); UScrime$Ed<- log(UScrime$Ed); 
UScrime$Po1<- log(UScrime$Po1); UScrime$Po2<- log(UScrime$Po2); 
UScrime$LF<- log(UScrime$LF); UScrime$M.F<- log(UScrime$M.F)
UScrime$Pop<- log(UScrime$Pop); UScrime$NW<- log(UScrime$NW); 
UScrime$U1<- log(UScrime$U1); UScrime$U2<- log(UScrime$U2); 
UScrime$GDP<- log(UScrime$GDP); UScrime$Ineq<- log(UScrime$Ineq)
UScrime$Prob<- log(UScrime$Prob); UScrime$Time<- log(UScrime$Time) 
noise<- matrix(rnorm(35*nrow(UScrime)), ncol=35)
colnames(noise)<- paste('noise', 1:35, sep='')
UScrime_plus_noise<- cbind(UScrime, noise)

y<- UScrime_plus_noise$y
UScrime_plus_noise$y<- NULL

# run 2 iterations and examine results
iBMA.bicreg.crime<- iBMA.bicreg( x = UScrime_plus_noise, Y=y, thresProbne0 = 5, 
                                 verbose = TRUE, maxNvar = 30, nIter = 2)
summary(iBMA.bicreg.crime)
orderplot(iBMA.bicreg.crime)

# run from current state until completion
iBMA.bicreg.crime<- iBMA.bicreg( iBMA.bicreg.crime, nIter = 200)
summary(iBMA.bicreg.crime)
orderplot(iBMA.bicreg.crime)

cat("CAUTION: iBMA.bicreg can give degenerate results when")
cat("the number of predictor variables is large

")

Run the code above in your browser using DataLab