NGeDSgam: NGeDSgam: Local Scoring Algorithm with GeD Splines in Backfitting

Description

Implements the Local Scoring Algorithm (Hastie and Tibshirani (1986)), applying normal linear GeD splines (i.e., NGeDS function) to fit the targets within each backfitting iteration. Higher order fits are computed by pursuing stage B of GeDS after the local-scoring algorithm is run.

Usage

NGeDSgam(
  formula,
  family = "gaussian",
  data,
  weights = NULL,
  normalize_data = FALSE,
  min_iterations,
  max_iterations,
  phi_gam_exit = 0.99,
  q_gam = 2L,
  beta = 0.5,
  phi = 0.99,
  internal_knots = 500L,
  q = 2L,
  higher_order = TRUE
)

Value

An object of class "GeDSgam" (a named list) with components:

extcall

Call to the NGeDSgam function.

formula

A formula object representing the model to be fitted.

args

A list containing the arguments passed to the NGeDSgam function. This list includes:

response: data.frame containing the response variable observations.

predictors

data.frame containing the corresponding observations of the predictor variables included in the model.

base_learners

Description of the model's base learners ("smooth functions").

family

The statistical family. The possible options are:

binomial(link = "logit", "probit", "cauchit", "log", "cloglog"),
gaussian(link = "identity", "log", "inverse"),
Gamma(link = "inverse", "identity", "log"),
inverse.gaussian(link = "1/mu^2", "inverse", "identity", "log"),
poisson(link = "log", "identity", "sqrt"),
quasi(link = "identity", variance = "constant"),
quasibinomial(link = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt") and
quasipoisson(link = "log", "identity", "sqrt").

normalize_data

If TRUE, then response and predictors were standardized before running the local-scoring algorithm.

X_mean

Mean of the predictor variables (only if normalize_data = TRUE).

X_sd

Standard deviation of the predictors (only if normalize_data = TRUE, otherwise this is NULL).

Y_mean

Mean of the response variable (only if normalize_data = TRUE, otherwise this is NULL).

Y_sd

Standard deviation of the response variable (only if normalize_data = TRUE, otherwise this is NULL).

final_model

A list detailing the final "GeDSgam" model selected after running the local scoring algorithm. The chosen model minimizes deviance across all models generated by each local-scoring iteration. This list includes:

model_name: Local-scoring iteration that yielded the "best" model. Note that when family = "gaussian", it will always correspond to iter1, as only one local-scoring iteration is conducted in this scenario. This occurs because, when family = "gaussian", the algorithm is tantamount to directly implementing backfitting.

dev

Deviance of the final model. For family = "gaussian" this coincides with the Residual Sum of Squares.

Y_hat

Fitted values, including: - eta: the additive predictor, - mu: the vector of means, - z: the adjusted dependent variable.

base_learners

A list containing, for each base-learner, the corresponding linear fit piecewise polynomial coefficients. It includes the knots for each order fit, resulting from computing the averaging knot location. Although if the number of internal knots of the final linear fit is less than \(n-1\), the averaging knot location is not computed.

linear.fit

Final linear fit in B-spline form (see SplineReg).

quadratic.fit

Quadratic fit obtained via Schoenberg variation diminishing approximation (see SplineReg).

cubic.fit

Cubic fit obtained via via Schoenberg variation diminishing approximation (see SplineReg).

predictions

A list containing the predicted values obtained for each of the fits (linear, quadratic, and cubic). Each of the predictions contains both the additive predictor eta and the vector of means mu.

internal_knots

A list detailing the internal knots obtained for the fits of different order (linear, quadratic, and cubic).

Arguments

formula: A description of the model structure to be fitted, specifying both the dependent and independent variables. Unlike NGeDS and GGeDS, this formula supports multiple additive (normal) GeD spline regression components as well as linear components. For example, setting formula = Y ~ f(X1) + f(X2) + X3 implies using a normal linear GeD spline as the smoother for X1 and for X2, while for X3 a linear model would be used.
family: A character string indicating the response variable distribution and link function to be used. Default is "gaussian". This should be a character or a family object.
data: A data.frame containing the variables referenced in the formula.
weights: An optional vector of "prior weights" to be put on the observations during the fitting process. It should be NULL or a numeric vector of the same length as the response variable defined in the formula.
normalize_data: A logical that defines whether the data should be normalized (standardized) before fitting the baseline linear model, i.e., before running the local-scoring algorithm. Normalizing the data involves scaling the predictor variables to have a mean of 0 and a standard deviation of 1. This process alters the scale and interpretation of the knots and coefficients estimated. Default is equal to FALSE.
min_iterations: Optional parameter to manually set a minimum number of local-scoring iterations to be run. If not specified, it defaults to 0L.
max_iterations: Optional parameter to manually set the maximum number of local-scoring iterations to be run. If not specified, it defaults to 100L. This setting serves as a fallback when the stopping rule, based on consecutive deviances and tuned by phi_gam_exit and q_gam, does not trigger an earlier termination (see Dimitrova et al. (2025)). Therefore, users can increase/decrease the number of local-scoring iterations, by increasing/decreasing the value phi_gam_exit and/or q_gam, or directly specify max_iterations.
phi_gam_exit: Convergence threshold for local-scoring and backfitting. Both algorithms stop when the relative change in the deviance is below this threshold. Default is 0.99.
q_gam: Numeric parameter which allows to fine-tune the stopping rule of the local-scoring and backfitting iterations. By default equal to 2L.
beta: Numeric parameter in the interval \([0,1]\) tuning the knot placement in stage A of GeDS, for each of the GeD spline components of the model. Default is equal to 0.5. See Details in NGeDS.
phi: Numeric parameter in the interval \((0,1)\) specifying the threshold for the stopping rule (model selector) in stage A of GeDS, for each of the GeD spline components of the model. Default is equal to 0.99. See Details in NGeDS.
internal_knots: The maximum number of internal knots that can be added by the GeDS smoothers at each backfitting iteration, effectively setting the value of max.intknots in NGeDS at each backfitting iteration. Default is 500L.
q: Numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, for each of the GeD spline components of the model. By default equal to 2L. See Details in NGeDS.
higher_order: a logical that defines whether to compute the higher order fits (quadratic and cubic) after the local-scoring algorithm is run. Default is TRUE.

Details

The NGeDSgam function employs the local scoring algorithm to fit a generalized additive model (GAM). This algorithm iteratively fits weighted additive models by backfitting. Normal linear GeD splines, as well as linear learners, are supported as function smoothers within the backfitting algorithm. The local-scoring algorithm ultimately produces a linear fit. Higher order fits (quadratic and cubic) are then computed by calculating the Schoenberg’s variation diminishing spline (VDS) approximation of the linear fit.

On the one hand, NGeDSgam includes all the parameters of NGeDS, which in this case tune the function smoother fit at each backfitting iteration. On the other hand, NGeDSgam includes some additional parameters proper to the local-scoring procedure. We describe the main ones as follows.

The family chosen determines the link function, adjusted dependent variable and weights to be used in the local-scoring algorithm. The number of local-scoring and backfitting iterations is controlled by a Ratio of Deviances stopping rule similar to the one presented for NGeDS/GGeDS. In the same way phi and q tune the stopping rule of NGeDS/GGeDS, phi_gam_exit and q_gam tune the stopping rule of NGeDSgam. The user can also manually control the number of local-scoring iterations through min_iterations and max_iterations.

A model term wrapped in offset() is treated as a known (fixed) component and added directly to the linear predictor when fitting the model. In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common \(N\)-vector of coordinates. See formula.

References

Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models. Statistical Science 1 (3) 297 - 310.
DOI: tools:::Rd_expr_doi("10.1214/ss/1177013604")

Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079--1105.
DOI: tools:::Rd_expr_doi("10.1007/s00180-015-0621-7")

Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023). Geometrically designed variable knot splines in generalized (non-)linear models. Applied Mathematics and Computation, 436.
DOI: tools:::Rd_expr_doi("10.1016/j.amc.2022.127493")

Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.

Examples

Run this code


# Load package
library(GeDS) 

data(airquality) 
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)

formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi = 0.8)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")

## S3 methods for class 'GeDSgam'
# Print 
print(Gmodgam); summary(Gmodgam)
# Knots
knots(Gmodgam, n = 2)
knots(Gmodgam, n = 3)
knots(Gmodgam, n = 4)
# Coefficients
coef(Gmodgam, n = 2)
coef(Gmodgam, n = 3)
coef(Gmodgam, n = 4)
# Wald-type confidence intervals
confint(Gmodgam, n = 2)
confint(Gmodgam, n = 3)
confint(Gmodgam, n = 4)
# Deviances
deviance(Gmodgam, n = 2)
deviance(Gmodgam, n = 3)
deviance(Gmodgam, n = 4)

Run the code above in your browser using DataLab