This wrapper function automatically initializes the model by adding all numerical
features of a dataset within a spline base-learner. Categorical features are
dummy encoded and inserted using linear base-learners without intercept. After
initializing the model boostSpline
also fits as many iterations as given
by the user through iters
.
boostSplines(data, target, optimizer = OptimizerCoordinateDescent$new(),
loss, learning.rate = 0.05, iterations = 100, trace = -1,
degree = 3, n.knots = 20, penalty = 2, differences = 2,
data.source = InMemoryData, data.target = InMemoryData)
[data.frame
]
A data frame containing the data on which the model should be built.
[character(1)
]
Character indicating the target variable. Note that the loss must match the
data type of the target.
[S4 Optimizer
]
Optimizer to select features. This should be an initialized S4 Optimizer
object
exposed by Rcpp (for instance OptimizerCoordinateDescent$new()
).
[S4 Loss
]
Loss used to calculate the risk and pseudo residuals. This object must be an initialized
S4 Loss
object exposed by Rcpp (for instance LossQuadratic$new()
).
[numeric(1)
]
Learning rate which is used to shrink the parameter in each step.
[integer(1)
]
Number of iterations that are trained.
[integer(1)
]
Integer indicating how often a trace should be printed. Specifying trace = 10
, then every
10th iteration is printed. If no trace should be printed set trace = 0
. Default is
-1 which means that we set trace
at a value that 40 iterations are printed.
[integer(1)
]
Polynomial degree of the splines used for modeling. Note that the number of parameter
increases with the degrees.
[integer(1)
]
Number of equidistant "inner knots". The real number of used knots also depends on
the polynomial degree.
[numeric(1)
]
Penalty term for p-splines. If penalty equals 0, then ordinary b-splines are fitted.
The higher penalty, the higher the smoothness.
[integer(1)
]
Number of differences that are used for penalization. The higher this value is, the
more function values of neighbor knots are forced to be more similar which results
in a smoother curve.
[S4 Data
]
Uninitialized S4 Data
object which is used to store the data. At the moment
just in memory training is supported.
[S4 Data
]
Uninitialized S4 Data
object which is used to store the data. At the moment
just in memory training is supported.
Usually a model of class Compboost
. This model is an R6
object
which can be used for retraining, predicting, plotting, and anything described in
?Compboost
.
The returned object is an object of the Compboost
class which then can be
used for further analyses (see ?Compboost
for details).
# NOT RUN {
mod = boostSplines(data = iris, target = "Sepal.Length", loss = LossQuadratic$new())
mod$getBaselearnerNames()
mod$getEstimatedCoef()
table(mod$getSelectedBaselearner())
mod$predict()
mod$plot("Sepal.Width_spline")
# }
Run the code above in your browser using DataLab