Implements the Local Scoring Algorithm (Hastie and Tibshirani
(1986)), applying normal linear GeD splines (i.e., NGeDS
function) to fit the targets within each backfitting iteration. Higher order
fits are computed by pursuing stage B of GeDS after the local-scoring algorithm
is run.
NGeDSgam(
formula,
family = "gaussian",
data,
weights = NULL,
normalize_data = FALSE,
min_iterations,
max_iterations,
phi_gam_exit = 0.99,
q_gam = 2L,
beta = 0.5,
phi = 0.99,
internal_knots = 500L,
q = 2L,
higher_order = TRUE
)An object of class "GeDSgam" (a named list) with components:
Call to the NGeDSgam function.
A formula object representing the model to be fitted.
A list containing the arguments passed to the NGeDSgam function.
This list includes:
responsedata.frame containing the response variable
observations.
predictorsdata.frame containing the corresponding
observations of the predictor variables included in the model.
base_learnersDescription of the model's base learners ("smooth functions").
familyThe statistical family. The possible options are:
binomial(link = "logit", "probit", "cauchit", "log", "cloglog"),
gaussian(link = "identity", "log", "inverse"),
Gamma(link = "inverse", "identity", "log"),
inverse.gaussian(link = "1/mu^2", "inverse", "identity", "log"),
poisson(link = "log", "identity", "sqrt"),
quasi(link = "identity", variance = "constant"),
quasibinomial(link = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt") and
quasipoisson(link = "log", "identity", "sqrt").
normalize_dataIf TRUE, then response and predictors
were standardized before running the local-scoring algorithm.
X_meanMean of the predictor variables (only if
normalize_data = TRUE).
X_sdStandard deviation of the predictors (only if
normalize_data = TRUE, otherwise this is NULL).
Y_meanMean of the response variable (only if
normalize_data = TRUE, otherwise this is NULL).
Y_sdStandard deviation of the response variable (only if
normalize_data = TRUE, otherwise this is NULL).
A list detailing the final "GeDSgam" model selected after
running the local scoring algorithm. The chosen model minimizes deviance
across all models generated by each local-scoring iteration. This list includes:
model_nameLocal-scoring iteration that yielded the "best"
model. Note that when family = "gaussian", it will always correspond
to iter1, as only one local-scoring iteration is conducted in this
scenario. This occurs because, when family = "gaussian", the
algorithm is tantamount to directly implementing backfitting.
devDeviance of the final model. For family = "gaussian"
this coincides with the Residual Sum of Squares.
Y_hatFitted values, including:
- eta: the additive predictor,
- mu: the vector of means,
- z: the adjusted dependent variable.
base_learnersA list containing, for each base-learner, the corresponding linear fit piecewise polynomial coefficients. It includes the knots for each order fit, resulting from computing the averaging knot location. Although if the number of internal knots of the final linear fit is less than \(n-1\), the averaging knot location is not computed.
linear.fitFinal linear fit in B-spline form (see SplineReg).
quadratic.fitQuadratic fit obtained via Schoenberg
variation diminishing approximation (see SplineReg).
cubic.fitCubic fit obtained via via Schoenberg variation
diminishing approximation (see SplineReg).
A list containing the predicted values obtained for each of
the fits (linear, quadratic, and cubic). Each of the predictions contains
both the additive predictor eta and the vector of means mu.
A list detailing the internal knots obtained for the fits of different order (linear, quadratic, and cubic).
A description of the model structure to be fitted,
specifying both the dependent and independent variables. Unlike NGeDS
and GGeDS, this formula supports multiple additive (normal) GeD
spline regression components as well as linear components. For example, setting
formula = Y ~ f(X1) + f(X2) + X3 implies using a normal linear GeD
spline as the smoother for X1 and for X2, while for X3 a
linear model would be used.
A character string indicating the response variable distribution
and link function to be used. Default is "gaussian". This should be a
character or a family object.
A data.frame containing the variables referenced in the formula.
An optional vector of "prior weights" to be put on the
observations during the fitting process. It should be NULL or a numeric
vector of the same length as the response variable defined in the formula.
A logical that defines whether the data should be
normalized (standardized) before fitting the baseline linear model, i.e.,
before running the local-scoring algorithm. Normalizing the data involves
scaling the predictor variables to have a mean of 0 and a standard deviation
of 1. This process alters the scale and interpretation of the knots and
coefficients estimated. Default is equal to FALSE.
Optional parameter to manually set a minimum number of local-scoring iterations to be run. If not specified, it defaults to 0L.
Optional parameter to manually set the maximum number
of local-scoring iterations to be run. If not specified, it defaults to 100L.
This setting serves as a fallback when the stopping rule, based on
consecutive deviances and tuned by phi_gam_exit and q_gam,
does not trigger an earlier termination (see Dimitrova et al. (2025)).
Therefore, users can increase/decrease the number of local-scoring iterations,
by increasing/decreasing the value phi_gam_exit and/or q_gam,
or directly specify max_iterations.
Convergence threshold for local-scoring and backfitting.
Both algorithms stop when the relative change in the deviance is below this
threshold. Default is 0.99.
Numeric parameter which allows to fine-tune the stopping rule of
the local-scoring and backfitting iterations. By default equal to 2L.
Numeric parameter in the interval \([0,1]\)
tuning the knot placement in stage A of GeDS, for each of the GeD spline
components of the model. Default is equal to 0.5.
See Details in NGeDS.
Numeric parameter in the interval \((0,1)\) specifying the
threshold for the stopping rule (model selector) in stage A of GeDS, for each
of the GeD spline components of the model. Default is equal to 0.99.
See Details in NGeDS.
The maximum number of internal knots that can be added
by the GeDS smoothers at each backfitting iteration, effectively setting the
value of max.intknots in NGeDS at each backfitting
iteration. Default is 500L.
Numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, for each of the GeD spline components of the model. By
default equal to 2L. See Details in NGeDS.
a logical that defines whether to compute the higher order
fits (quadratic and cubic) after the local-scoring algorithm is run. Default
is TRUE.
The NGeDSgam function employs the local scoring algorithm to fit a
generalized additive model (GAM). This algorithm iteratively fits weighted
additive models by backfitting. Normal linear GeD splines, as well as linear
learners, are supported as function smoothers within the backfitting
algorithm. The local-scoring algorithm ultimately produces a linear fit.
Higher order fits (quadratic and cubic) are then computed by calculating the
Schoenberg’s variation diminishing spline (VDS) approximation of the linear
fit.
On the one hand, NGeDSgam includes all the parameters of
NGeDS, which in this case tune the function smoother fit at each
backfitting iteration. On the other hand, NGeDSgam includes some
additional parameters proper to the local-scoring procedure. We describe
the main ones as follows.
The family chosen determines the link function, adjusted dependent
variable and weights to be used in the local-scoring algorithm. The number of
local-scoring and backfitting iterations is controlled by a
Ratio of Deviances stopping rule similar to the one presented for
NGeDS/GGeDS. In the same way phi and q
tune the stopping rule of NGeDS/GGeDS,
phi_gam_exit and q_gam tune the stopping rule of NGeDSgam.
The user can also manually control the number of local-scoring iterations
through min_iterations and max_iterations.
A model term wrapped in offset() is treated as a known (fixed) component
and added directly to the linear predictor when fitting the model. In case
more than one covariate is fixed, the user should sum the corresponding
coordinates of the fixed covariates to produce one common \(N\)-vector of
coordinates. See formula.
Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models.
Statistical Science 1 (3) 297 - 310.
DOI: tools:::Rd_expr_doi("10.1214/ss/1177013604")
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079--1105.
DOI: tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
# Load package
library(GeDS)
data(airquality)
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)
formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi = 0.8)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")
## S3 methods for class 'GeDSgam'
# Print
print(Gmodgam); summary(Gmodgam)
# Knots
knots(Gmodgam, n = 2)
knots(Gmodgam, n = 3)
knots(Gmodgam, n = 4)
# Coefficients
coef(Gmodgam, n = 2)
coef(Gmodgam, n = 3)
coef(Gmodgam, n = 4)
# Wald-type confidence intervals
confint(Gmodgam, n = 2)
confint(Gmodgam, n = 3)
confint(Gmodgam, n = 4)
# Deviances
deviance(Gmodgam, n = 2)
deviance(Gmodgam, n = 3)
deviance(Gmodgam, n = 4)
Run the code above in your browser using DataLab