amlps: Bayesian additive partial linear modeling with Laplace-P-splines.

Description

Fits an additive partial linear model to data using an approximate Bayesian inference technique based on penalized regression splines and Laplace approximations. Smooth additive terms are specified as a linear combination of of a large number of cubic B-splines. To counterbalance the roughness of the fit, a discrete penalty on neighboring spline coefficients is imposed in the spirit of Eilers and Marx (1996). The error of the model is assumed to be Gaussian with zero mean and finite variance.The optimal amount of smoothing is determined by a grid-based exploration of the posterior penalty space when the number of smooth terms is small to moderate. When the dimension of the penalty space is large, the optimal smoothing parameter is chosen to be the value that maximizes the (log-)posterior of the penalty vector.

Usage

amlps(formula, data, K = 30, penorder = 2, cred.int = 0.95)

Value

An object of class amlps containing several components from the fit. Details can be found in amlps.object. Details on the output printed by amlps can be found in

print.amlps. Fitted smooth terms can be visualized with the

plot.amlps routine.

Arguments

formula: A formula object where the ~ operator separates the response from the covariates of the linear part z1,z2,.. and the smooth terms. A smooth term is specified by using the notation sm(.). For instance, the formula y ~ z1+sm(x1)+sm(x2) specifies an additive model of the form E(y)=b0+b1z1+f1(x1)+f2(x2), where b0, b1 are the regression coefficients of the linear part and f1(.) and f2(.) are smooth functions of the continuous covariates x1 and x2 respectively.
data: Optional. A data frame to match the variable names provided in formula.
K: A positive integer specifying the number of cubic B-spline functions in the basis used to model the smooth terms. Default is K = 30 and allowed values are 15 <= K <= 60. The same basis dimension is used for each smooth term in the model. Also, the computational cost to fit the model increases with K.
penorder: The penalty order used on finite differences of the coefficients of contiguous B-splines. Can be either 2 for a second-order penalty (the default) or 3 for a third-order penalty.
cred.int: The level of the pointwise credible interval to be computed for the coefficients in the linear part of the model.

Author

Oswaldo Gressani oswaldo_gressani@hotmail.fr.

Details

The B-spline basis used to approximate a smooth additive component is computed with the function cubicbs. The lower (upper) bound of the B-spline basis is taken to be the minimum (maximum) value of the covariate associated to the smooth. For identifiability purposes, the B-spline matrices (computed over the observed covariates) are centered. The centering consists is subtracting from each column of the B-spline matrix, the corresponding column average of another B-spline matrix computed on a fine grid of equidistant values in the domain of the smooth term. A hierarchical Gamma prior is imposed on the roughness penalty vector and Jeffreys' prior is imposed on the precision of the error. A Newton-Raphson algorithm is used to compute the posterior mode of the (log-)posterior penalty vector. The latter algorithm uses analytically derived versions of the gradient and Hessian. When the number of smooth terms in the model is smaller or equal to 4, a grid-based strategy is used for posterior exploration of the penalty space. Above that threshold, the optimal amount of smoothness is determined by the posterior maximum value of the penalty vector. This strategy allows to keep the computational burden to fit the model relatively low and to conserve good statistical performance.

References

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2): 89-121.

Fan, Y. and Li, Q. (2003). A kernel-based method for estimating additive partially linear models. Statistica Sinica, 13(3): 739-762.

Gressani, O. and Lambert, P. (2018). Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines. Computational Statistical & Data Analysis 124: 151-167.

Opsomer, J. D. and Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8(4): 715-732.

Examples

Run this code

### Classic simulated data example (with simgamdata)

set.seed(17)
sim.data <- simgamdata(setting = 2, n = 200, dist = "gaussian", scale = 0.4)
data <- sim.data$data  # Simulated data frame

# Fit model
fit <- amlps(y ~ z1 + z2 + sm(x1) + sm(x2), data = data, K = 15)
fit

Run the code above in your browser using DataLab