Learn R Programming

threeboost (version 1.1)

threeboost: Thresholded EEBoost

Description

Run the thresholded EEBoost procedure.

Usage

threeboost(Y, X, EE.fn, b.init = rep(0, ncol(X)), eps = 0.01, maxit = 1000, itertrack = FALSE, reportinterval = 1, stop.rule = "on.repeat", thresh = 1)

Arguments

Y
Vector of outcomes.
X
Matrix of predictors. Will be automatically scaled using the scale function.
EE.fn
Estimating function taking arguments Y, X, and parameter vector b.
b.init
Initial parameter values. For variable selection, typically start with a vector of zeroes (the default).
eps
Step length. Default is 0.01, value should be relatively small.
maxit
Maximum number of iterations. Default is 1000.
itertrack
Indicates whether or not diagnostic information should be printed out at each iteration. Default is FALSE.
reportinterval
If itertrack is TRUE, how many iterations the algorithm should wait between each diagnostic report.
stop.rule
Rule for stopping the iterations before maxit is reached. Possible values are "on.repeat" and "pct.change". See 'Details' for more information.
thresh
Threshold parameter for ThrEEBoost.

Value

A matrix with maxit rows and ncol(X) columns, with each row containing the parameter vector from an iteration of ThrEEBoost.

Details

threeboost Implements a thresholded version of the EEBoost algorithm described in Wolfson (2011, JASA). EEBoost is a general-purpose method for variable selection which can be applied whenever inference would be based on an estimating equation. The package currently implements variable selection based on the Generalized Estimating Equations, but can also accommodate user-provided estimating functions. Thresholded EEBoost is a generalization which allows multiple variables to enter the model at each boosting step. Thresholded EEBoost with thresholding parameter = 1 is equivalent to EEBoost.

Typically, the boosting procedure is run for maxit iterations, producing maxit models defined by a set of regression coefficients. An additional step (e.g. model scoring, cross-validated estimate of prediction error) is needed to select a final model. However, an alternative is to stop the iterations before maxit is reached. The user can request this feature by setting stop.rule to one of the following options:

  • "on.repeat": Sometimes, ThrEEBoost will alternate between stepping on the same two directions, usually indicating numerical problems. Setting stop.rule="on.oscillate" will terminate the algorithm if this happens.
  • "pct.change": Stop if, for conseuctive iterations, the sum of the magnitudes of the elements of the estimating equation changes by < 1%.

See Also

geeboost for an example of how to call (Thr)EEBoost with a custom estimating function.

Wolfson, J. EEBoost: A general method for prediction and variable selection using estimating equations. Journal of the American Statistical Association, 2011.

Examples

Run this code
library(Matrix)

# Generate some test data - uses 'mvtnorm' package
n <- 30
n.var <- 50
clust.size <- 4
B <- c(rep(2,5),rep(0.2,5),rep(0.05,10),rep(0,n.var-20))
mn.X <- rep(0,n.var)
sd.X <- 0.5
rho.X <- 0.3
cov.sig.X <- sd.X^2*((1-rho.X)*diag(rep(1,10)) + rho.X*matrix(data=1,nrow=10,ncol=10))
sig.X <- as.matrix( Matrix::bdiag(lapply(1:(n.var/10),function(x) { cov.sig.X } ) ) )
sd.Y <- 0.5
rho.Y <- 0.3
indiv.Sig <- sd.Y^2*( (1-rho.Y)*diag(rep(1,4)) + rho.Y*matrix(data=1,nrow=4,ncol=4) )
sig.list <- list(length=n)
for(i in 1:n) { sig.list[[i]] <- indiv.Sig }
Sig <- Matrix::bdiag(sig.list)
indiv.index <- rep(1:n,each=clust.size)
sig.Y <- as.matrix(Sig)
if(require(mvtnorm)) {
X <- mvtnorm::rmvnorm(n*clust.size,mean=mn.X,sigma=sig.X)
mn.Y <- X %*% B
## Correlated continuous outcome
Y <- mvtnorm::rmvnorm(1,mean=mn.Y,sigma=sig.Y)
} else { stop('Need mvtnorm package to generate correlated example data.') }

## Define the Gaussian GEE estimating function with independence working correlation
mu.Lin <- function(eta){eta}
g.Lin <- function(m){m}
v.Lin <- function(eta){rep(1,length(eta))}

 EE.fn.ind <- function(Y,X,b) {
 ee.GEE(Y,X,b,
 mu.Y=mu.Lin,
 g.Y=g.Lin,
 v.Y=v.Lin,
 aux=function(...) { ee.GEE.aux(...,mu.Y=mu.Lin,g.Y=g.Lin,v.Y=v.Lin) },
 id=indiv.index,
 corstr="ind")
}

## These two give the same result
coef.mat <- eeboost(Y,X,EE.fn.ind,maxit=250)
coef.mat2 <- geeboost(Y,X,id=indiv.index,family="gaussian",corstr="ind",maxit=250)$coefmat

par(mfrow=c(1,2))
coef_traceplot(coef.mat)
coef_traceplot(coef.mat2)

Run the code above in your browser using DataLab