Compboost
wraps the S4
class system exposed by Rcpp
to make defining
objects, adding objects, the training and taking predictions, and plotting much easier.
As already mentioned, the Compboost
class is just a wrapper and hence compatible
with the most S4
classes. This together defines the compboost API.
R6Class
object.
cboost = Compboost$new(data, target, optimizer = OptimizerCoordinateDescent$new(), loss, learning.rate = 0.05)
cboost$addLogger(logger, use.as.stopper = FALSE, logger.id, ...)
cbboost$addBaselearner(features, id, bl.factory, data.source = InMemoryData, data.target = InMemoryData, ...)
cbboost$train(iteration = 100, trace = TRUE)
cboost$getCurrentIteration()
cboost$predict(newdata = NULL)
cboost$getInbagRisk()
cboost$getSelectedBaselearner()
cboost$getEstimatedCoef()
cboost$plot(blearner.type = NULL, iters = NULL, from = NULL, to = NULL, length.out = 1000)
cboost$getBaselearnerNames()
cboost$prepareData(newdata)
For Compboost$new():
data
[data.frame
]
Data used for training.
target
[character(1)
]
Character naming the target. It is necessary that target is available as column in data.
optimizer
[S4 Optimizer
]
Optimizer used for the fitting process given as initialized S4 Optimizer
class.
Default is the OptimizerCoordinateDescent
.
loss
[S4 Loss
]
Loss as initialized S4 Loss
which is used to calculate pseudo residuals and the
empirical risk. Note that the loss needs match the data type of the target variable.
See the details for possible choices.
learning.rage
[numeric(1)
]
Learning rate used to shrink estimated parameter in each iteration. The learning rate
remains constant during the training and has to be between 0 and 1.
For cboost$addLogger():
logger
[S4 Logger
]
Logger which are registered within a logger list. The objects must be given as uninitialized
S4 Logger
class. See the details for possible choices.
use.as.stopper
[logical(1)
]
Logical indicating whether the new logger should also be used as stopper. Default value is
FALSE
.
logger.id
[character(1)
]
Id of the new logger. This is necessary to e.g. register multiple risk logger.
...
Further arguments passed to the constructor of the S4 Logger
class specified in
logger
. For possible arguments see details or the help pages (e.g. ?LoggerIteration
)
of the S4
classes.
For cboost$addBaselearner():
features
[character()
]
Vector of column names which are used as input data matrix for a single base-learner. Note that not
every base-learner supports the use of multiple features (e.g. the spline base-learner).
id
[character(1)
]
Id of the base-learners. This is necessary since it is possible to define multiple learners with the same underlying data.
bl.factory
[S4 Factory
]
Uninitialized base-learner factory represented as S4 Factory
class. See the details
for possible choices.
data.source
[S4 Data
]
Data source object. At the moment just in memory is supported.
data.target
[S4 Data
]
Data target object. At the moment just in memory is supported.
...
Further arguments passed to the constructor of the S4 Factory
class specified in
bl.factory
. For possible arguments see the help pages (e.g. ?BaselearnerPSplineFactory
)
of the S4
classes.
For cboost$train():
iteration
[integer(1)
]
Set the algorithm at iteration
. Note: This argument is ignored if this is the first
training and an iteration logger is already specified. For further uses the algorithm automatically
continues training if iteration
is set to an value larger than the already trained iterations.
trace
[integer(1)
]
Integer indicating how often a trace should be printed. Specifying trace = 10
, then every
10th iteration is printed. If no trace should be printed set trace = 0
. Default is
-1 which means that we set trace
at a value that 40 iterations are printed.
For cboost$predict():
newdata
[data.frame()
]
Data to predict on. If NULL
predictions on the training data are returned.
For cboost$plot():
blearner.type
[character(1)
]
Character name of the base-learner to plot the additional contribution to the response.
iters
[integer()
]
Integer vector containing the iterations the user wants to illustrate.
from
[numeric(1)
]
Lower bound for plotting (should be smaller than to
).
to
[numeric(1)
]
Upper bound for plotting (should be greater than from
).
length.out
[integer(1)
]
Number of equidistant points between from
and to
used for plotting.
data
[data.frame
]Data used for training the algorithm.
response
[vector
]Response given as vector.
target
[character(1)
]Name of the Response.
id
[character(1)
]Value to identify the data. By default name of data
, but can be overwritten.
optimizer
[S4 Optimizer
]Optimizer used within the fitting process.
loss
[S4 Loss
]Loss used to calculate pseudo residuals and empirical risk.
learning.rate
[numeric(1)
]Learning rate used to shrink the estimated parameter in each iteration.
model
[S4 Compboost_internal
]Internal S4 Compboost_internal
class on which the main operations are called.
bl.factory.list
[S4 FactoryList
]List of all registered factories represented as S4 FactoryList
class.
positive.category
[character(1)
]Character containing the name of the positive class in the case of classification.
stop.if.all.stoppers.fulfilled
[logical(1)
]Logical indicating whether all stopper should be used simultaneously or if it is sufficient that the first stopper which is fulfilled breaks the algorithm.
addLogger
method to add a logger to the algorithm (Note: This is just possible before the training).
addBaselearner
method to add a new base-learner factories to the algorithm (Note: This is just possible before the training).
getCurrentIteration
method to get the current iteration on which the algorithm is set.
train
method to train the algorithm.
predict
method to predict on a trained object.
getSelectedBaselearner
method to get a character vector of selected base-learner.
getEstimatedCoef
method to get a list of estimated coefficient for each selected base-learner.
plot
method to plot the Compboost
object.
getBaselearnerNames
method to get names of registered factories.
Loss Available choices for the loss are:
LossQuadratic
(Regression)
LossAbsolute
(Regression)
LossBinomial
(Binary Classification)
LossCustom
(Custom)
LossCustomCpp
(Custom)
(For each loss also take a look at the help pages (e.g. ?LossBinomial
) and the
C++
documentation for details about the underlying formulas)
Logger Available choices for the logger are:
LoggerIteration
: Log current iteration. Additional arguments:
max_iterations
[integer(1)
]Maximal number of iterations.
LoggerTime
: Log already elapsed time. Additional arguments:
max_time
[integer(1)
]Maximal time for the computation.
time_unit
[character(1)
]Character to specify the time unit. Possible choices are minutes
, seconds
, or microseconds
.
LoggerInbagRisk
:
used_loss
[S4 Loss
]Loss as initialized S4 Loss
which is used to calculate the empirical risk. See the
details for possible choices.
eps_for_break
[numeric(1)
]This argument is used if the logger is also used as stopper. If the relative improvement of the logged inbag risk falls above this boundary the stopper breaks the algorithm.
LoggerOobRisk
:
used_loss
[S4 Loss
]Loss as initialized S4 Loss
which is used to calculate the empirical risk. See the
details for possible choices.
eps_for_break
[numeric(1)
]This argument is used if the logger is also used as stopper. If the relative improvement of the logged inbag risk falls above this boundary the stopper breaks the algorithm.
oob_data
[list
]A list which contains data source objects which corresponds to the source data of each registered factory. The source data objects should contain the out of bag data. This data is then used to calculate the new predictions in each iteration.
oob_response
[vector
]Vector which contains the response for the out of bag data given within oob_data
.
Note:
Even if you do not use the logger as stopper you have to define the arguments such as max_time
.
We are aware of that the style guide here is not consistent with the R6
arguments. Nevertheless, using
_
as word separator is due to the used arguments within C++
.
# NOT RUN {
cboost = Compboost$new(mtcars, "mpg", loss = LossQuadratic$new())
cboost$addBaselearner("hp", "spline", BaselearnerPSpline, degree = 3,
n.knots = 10, penalty = 2, differences = 2)
cboost$train(1000)
table(cboost$getSelectedBaselearner())
cboost$plot("hp_spline")
# }
Run the code above in your browser using DataLab