SplineReg: Estimation of the coefficients of a predictor model with spline and possibly parametric components.

Description

Functions that estimate the coefficients of a predictor model involving a spline component and possibly a parametric component applying (Iteratively Re-weighted) Least Squares (IR)LS iteration.

Usage

SplineReg_LM(X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1,
  length(X)), InterKnots, n, extr = range(X), prob = 0.95)
SplineReg_GLM(X, Y, Z, offset = rep(0, nobs), weights = rep(1, length(X)),
  InterKnots, n, extr = range(X), family, mustart, inits = NULL,
  etastart = NULL)

Arguments

a numeric vector containing $N$ sample values of the covariate chosen to enter the spline regression component of the predictor model.

a vector of size $N$ containing the observed values of the response variable $y$.

a design matrix with $N$ rows containing other covariates selected to enter the parametric component of the predictor model (see formula). If no such covariates are selected, it is set to NULL by default.

offset

a vector of size $N$ that can be used to specify a fixed covariate to be included in the predictor model avoiding the estimation of its corresponding regression coefficient. In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common $N$-vector of coordinates. The argument offset is particularly useful in Splinereg_GLM if the link function used is not the identity.

weights

an optional vector of `prior weights' to be put on the observations in the fitting process in case the user requires weighted fitting. It is a vector of 1s by default.

InterKnots

a numeric vector containing the locations of the internal knots necessary to compute the B-splines. In GeDS these are the internal knots in a current iteration of stage A.

integer value specifying the order of the spline to be evaluated. It should be 2 (linear spline), 3 (quadratic spline) or 4 (cubic spline). Non-integer values will be passed to the function as.integer.

extr

optional numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the sample values of X. By default equal correspondingly to the smallest and largest values of X.

prob

the confidence level to be used for the confidence bands in the SplineReg_LM fit. See details below.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. See family for details of family functions.

mustart

initial values for the vector of means in the IRLS estimation. Must be a vector of length $N$.

inits

a numeric vector of length length(InterKnots) + n + NCOL(Z) providing initial values for the coefficients, to be used in the IRLS estimation (alternative to providing the mustart vector).

etastart

initial values for the predictor in the IRLS estimation (alternative to providing either inits or mustart). Must be a vector of length $N$.

Value

A list containing:

Theta

a vector containing the fitted coefficients of the spline regression component and the parametric component of the predictor model.

Predicted

a vector of $N$ predicted mean values of the response variable computed at the sample values of the covariate(s).

Residuals

a vector containing the normal regression residuals if SplineReg_LM is called or the residuals described in Details if SplineReg_GLM is called.

RSS

the deviance for the fitted predictor model, defined as in Dimitrova et al. (2017), which for SplineReg_LM coincides with the Residual Sum of Squares.

NCI

a list containing the lower (Low) and upper (Upp) limits of the approximate confidence intervals computed at the sample values of the covariate(s). See details above.

Basis

the matrix of B-spline regression functions and the covariates of the parametric part evaluated at the sample values of the covariate(s).

Polygon

a list containing x-y coordinates ("Kn" and "Thetas") of the vertices of the Control Polygon, see Dimitrova et al. (2017).

deviance

a vector containing deviances computed at each IRLS step (computed only with the SplineReg_GLM).

temporary

the result of the function lm if SplineReg_LM is used or the output of the function IRLSfit (which is similar to the output from glm.fit), if SplineReg_GLM is used.

Details

The functions estimate the coefficients of a predictor model with a spline component (and possibly a parametric component) for a given, fixed order and vector of knots of the spline and a specified distribution of the response variable (from the Exponential Family). The functions SplineReg_LM and SplineReg_GLM are based correspondingly on LS and IRLS and used correspondingly in NGeDS and GGeDS, to estimate the coefficients of the final GeDS fits of stage B, after their knots have been positioned to coincide with the Greville abscissas of the knots of the linear fit from stage A (see Dimitrova et al. 2017). Additional inference related quantities are also computed (see Value below). The function SplineReg_GLM is also used to estimate the coefficients of the linear GeDS fit of stage A within GGeDS, whereas in NGeDS this estimation is performed internally leading to faster R code.

In addition SplineReg_LM computes some useful quantities among which confidence intervals and the Control Polygon (see Section 2 of Kaishev et al. 2016).

The confidence intervals contained in the output slot NCI are approximate local bands obtained considering the knots as fixed constants. Hence the columns of the design matrix are seen as covariates and standard methodology relying on the se.fit option of predict.lm or predict.glm is used. In the ACI slot, asymptotic confidence intervals are provided, following Kaishev et al (2006).

As mentioned, SplineReg_GLM is intensively used in Stage A of the GeDS algorithm implemented in GGeDS and in order to make it as fast as possible input data validation is mild. Hence it is expected that the user checks carefully the input parameters before using SplineReg_GLM. The "Residuals" in the output of this function are similar to the so called ``working residuals" in the glm function. "Residuals" are the residuals $r_i$ used in the knot placement procedure, i.e. $$r_i= (y_i - \hat{\mu}_i){d \mu_i \over d \eta_i },$$ but in contrast to glm ``working residuals", they are computed using the final IRLS fitted $\hat{\mu}_i$. "Residuals" are then used in locating the knots of the linear spline fit of Stage A.

In SplineReg_GLM confidence intervals are not computed.

References

Kaishev, V. K., Dimitrova, D. S., Haberman, S. & Verrall, R. J. (2006). Geometrically designed, variable know regression splines: asymptotics and inference (Statistical Research Paper No. 28). London, UK: Faculty of Actuarial Science & Insurance, City University London. URL: openaccess.city.ac.uk

Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079--1105. DOI: doi.org/10.1007/s00180-015-0621-7

Dimitrova, D.S., Kaishev, V.K., Lattuada A. and Verrall, R.J. (2017). Geometrically designed, variable knot splines in Generalized (Non-)Linear Models. Available at openaccess.city.ac.uk