Learn R Programming

COBRA (version 0.99.4)

COBRA: COBRA

Description

The function COBRA delivers prediction outcomes for a testing sample on the basis of a training sample and a bunch of basic regression machines. By default, those machines are wrappers to the R packages lars, ridge, tree and randomForest, covering a somewhat wide spectrum in contemporary prediction methods for regression. However the most interesting way to use COBRA is to use any regression method suggested by the context (see argument machines). COBRA may natively parallelize the computations (use option parallel).

Usage

COBRA(train.design,
      train.responses,
      split,
      test,
      machines,
      machines.names,
      logGrid = FALSE,
      grid = 200,
      alpha.machines,
      parallel = FALSE,
      nb.cpus = 2,
      plots = FALSE,
      savePlots = FALSE,
      logs = FALSE,
      progress = TRUE,
      path = "")

Arguments

train.design
Mandatory. The design matrix for the training sample.
train.responses
Mandatory. The responses vector for the training sample.
split
Optional. How should COBRA cut the training sample?
test
Mandatory. The design matrix of the testing sample.
machines
Optional. Regression basic machines provided by the user. This should be a matrix, whose number of rows is the length of the training sample (ntrain) plus the length of the testing sample (ntest), and with as many columns as machines. Element
machines.names
Optional. If machines is provided, a list including the names of the machines.
logGrid
Optional. If TRUE, parameter epsilon is generated according to a logarithmic scale. This should be TRUE if the user has a clue about the small magnitude of predictions.
grid
Optional. How many points should be used in the discretization scheme for calibrating the parameter epsilon.
alpha.machines
Optional. Coerce COBRA to use exactly alpha.machines. Obviously this should be a integer between 1 and the total number of machines.
parallel
Optional. If TRUE, computations will be dispatched over available cpus.
nb.cpus
Optional. If parallel, how many cpus should be used. Obviously this should not exceed the number of available cpus!
plots
Optional. If TRUE, explanatory plots about calibrating epsilon and alpha (see publication) are generated according to the path variable.
savePlots
Optional. If TRUE, plots are saved as .pdf files according to path, otherwise they pop up in the R IDE.
logs
Optional. If TRUE, quadratic risks over the training sample for all machines and COBRA are written in the file "risks.txt" according to the path variable.
progress
Optional. If TRUE, a progress bar and final quadratic errors are printed.
path
Optional. If savePlots and either plots or logs are TRUE, where should the corresponding files be created?

Value

  • Returns a list including only
  • predictThe vector of predicted values.

Details

For most users, options grid and split should be set to their default values.

References

http://www.lsta.upmc.fr/doct/guedj/index.html

G. Biau, A. Fischer, B. Guedj and J. D. Malley (2013), COBRA: A Nonlinear Aggregation Strategy. http://arxiv.org/abs/1303.2236 and http://hal.archives-ouvertes.fr/hal-00798579

See Also

COBRA-package

Examples

Run this code
n <- 500
d <- 30
ntrain <- 400
X <- replicate(d,2*runif(n = n)-1)
Y <- X[,1]^2 + X[,3]^3 + exp(X[,10]) + rnorm(n = n, sd = .1)
train.design <- as.matrix(X[1:ntrain,])
train.responses <- Y[1:ntrain]
test <- as.matrix(X[-(1:ntrain),])
test.responses <- Y[-(1:ntrain)]

## using the default machines
if(require(lars) && require(tree) && require(ridge) &&
require(randomForest))
{
res <- COBRA(train.design = train.design,
             train.responses = train.responses,
             test = test)

print(cbind(res$predict,test.responses))
plot(test.responses,res$predict,xlab="Responses",ylab="Predictions",pch=3,col=2)
abline(0,1,lty=2)
}

## using own machines
machines.names <- c("Soothsayer","Dummy")
machines <- matrix(nr = n, nc = 2, data = 0)
machines[,1] <- Y+rnorm(n = n, sd=.1)          ## soothsayer
machines[,2] <- mean(train.responses)          ## dummy prediction, averaging train.responses

res2 <- COBRA(train.design = train.design,
              train.responses = train.responses,
              test = test,
              machines = machines,
              machines.names = machines.names)

print(cbind(res2$predict,test.responses))
plot(test.responses,res2$predict,xlab="Responses",ylab="Predictions",pch=3,col=2)
abline(0,1,lty=2)

Run the code above in your browser using DataLab