Functions that estimate the coefficients of a predictor model involving a spline component and possibly a parametric component applying (Iteratively Re-weighted) Least Squares (IR)LS iteration.
SplineReg_LM(X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1,
length(X)), InterKnots, n, extr = range(X), prob = 0.95)SplineReg_GLM(X, Y, Z, offset = rep(0, nobs), weights = rep(1, length(X)),
InterKnots, n, extr = range(X), family, mustart, inits = NULL,
etastart = NULL)
a numeric vector containing \(N\) sample values of the covariate chosen to enter the spline regression component of the predictor model.
a vector of size \(N\) containing the observed values of the response variable \(y\).
a design matrix with \(N\) rows containing other covariates selected to enter the parametric
component of the predictor model (see formula
).
If no such covariates are selected, it is set to NULL
by default.
a vector of size \(N\) that can be used to specify a fixed covariate
to be included in the predictor model avoiding the estimation of its corresponding regression coefficient.
In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates
to produce one common \(N\)-vector of coordinates.
The argument offset
is particularly useful in
Splinereg_GLM
if the link function used is not the identity.
an optional vector of `prior weights' to be put on the observations in the fitting process in case the user requires weighted fitting. It is a vector of 1s by default.
a numeric vector containing the locations of the internal knots necessary to compute the B-splines. In GeDS these are the internal knots in a current iteration of stage A.
integer value specifying the order of the spline to be evaluated. It should be 2 (linear spline), 3 (quadratic spline) or 4 (cubic spline).
Non-integer values will be passed to the function as.integer
.
optional numeric vector of 2 elements representing the left-most and right-most limits
of the interval embedding the sample values of X
.
By default equal correspondingly to the smallest and largest values of X
.
the confidence level to be used for the confidence bands in the SplineReg_LM
fit. See details below.
a description of the error distribution and link function to be used
in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. See family
for details of family functions.
initial values for the vector of means in the IRLS estimation. Must be a vector of length \(N\).
a numeric vector of length length(InterKnots) + n + NCOL(Z)
providing
initial values for the coefficients, to be used in the IRLS estimation (alternative to providing the mustart
vector).
initial values for the predictor in the IRLS
estimation (alternative to providing either inits
or mustart
). Must be a vector of length \(N\).
A list
containing:
a vector containing the fitted coefficients of the spline regression component and the parametric component of the predictor model.
a vector of \(N\) predicted mean values of the response variable computed at the sample values of the covariate(s).
a vector containing the normal regression residuals if SplineReg_LM
is called or
the residuals described in Details if SplineReg_GLM
is called.
the deviance for the fitted predictor model, defined as in Dimitrova et al. (2017),
which for SplineReg_LM
coincides with the Residual Sum of Squares.
a list containing the lower (Low
) and upper (Upp
) limits of the approximate
confidence intervals computed at the sample values of the covariate(s). See details above.
the matrix of B-spline regression functions and the covariates of the parametric part evaluated at the sample values of the covariate(s).
a list containing x-y coordinates ("Kn
" and "Thetas
") of the vertices of
the Control Polygon, see Dimitrova et al. (2017).
a vector containing deviances computed at each IRLS
step (computed only with the SplineReg_GLM
).
the result of the function lm
if
SplineReg_LM
is used or the output of the function IRLSfit
(which is similar to the output from
glm.fit
), if SplineReg_GLM
is used.
The functions estimate the coefficients of a predictor model with a spline component (and possibly
a parametric component) for a given, fixed order and vector of
knots of the spline and a specified distribution of the response variable (from the Exponential Family).
The functions SplineReg_LM
and SplineReg_GLM
are based correspondingly on LS and IRLS and
used correspondingly in NGeDS
and GGeDS
, to estimate the coefficients of the final GeDS fits of stage B, after their
knots have been positioned to coincide with the Greville abscissas
of the knots of the linear fit from stage A (see Dimitrova et al. 2017). Additional
inference related quantities
are also computed (see Value below). The function SplineReg_GLM
is also used to estimate the
coefficients of the linear GeDS fit of stage A within GGeDS
, whereas
in NGeDS
this estimation is performed internally leading to faster R code.
In addition SplineReg_LM
computes some useful quantities among which confidence
intervals and the Control Polygon (see Section 2 of Kaishev et al. 2016).
The confidence intervals contained in the output slot NCI
are approximate local
bands obtained considering the knots as fixed constants.
Hence the columns of the design matrix are seen as covariates and standard
methodology relying on the se.fit
option of predict.lm
or predict.glm
is used.
In the ACI
slot, asymptotic confidence intervals are provided, following Kaishev et al (2006).
As mentioned, SplineReg_GLM
is intensively used in Stage A of the GeDS algorithm implemented in GGeDS
and in order
to make it as fast as possible input data validation is mild.
Hence it is expected that the user checks carefully the input parameters before using SplineReg_GLM
.
The "Residuals
" in the output of this function are similar to the so called
``working residuals" in the glm
function.
"Residuals
" are the residuals \(r_i\) used in the knot placement procedure,
i.e. $$r_i= (y_i - \hat{\mu}_i){d \mu_i \over d \eta_i },$$
but in contrast to glm
``working residuals", they are computed using the
final IRLS fitted \(\hat{\mu}_i\). "Residuals
" are then used in locating the knots of the linear
spline fit of Stage A.
In SplineReg_GLM
confidence intervals are not computed.
Kaishev, V. K., Dimitrova, D. S., Haberman, S. & Verrall, R. J. (2006). Geometrically designed, variable know regression splines: asymptotics and inference (Statistical Research Paper No. 28). London, UK: Faculty of Actuarial Science & Insurance, City University London. URL: openaccess.city.ac.uk
Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079--1105. DOI: doi.org/10.1007/s00180-015-0621-7
Dimitrova, D.S., Kaishev, V.K., Lattuada A. and Verrall, R.J. (2017). Geometrically designed, variable knot splines in Generalized (Non-)Linear Models. Available at openaccess.city.ac.uk