smoothSpline: Fit a spline to smooth the relationship between a `response` and an `x` in a `data.frame`, optionally computing growth rates using derivatives.

Description

Uses smooth.spline to fit a natural cubic smoothing spline or JOPS to fit a P-spline to all the values of response stored in data.

The amount of smoothing can be controlled by tuning parameters, these being related to the penalty. For a natural cubic smoothing spline, these are df or lambda and, for a P-spline, it is lambda. For a P-spline, npspline.segments also influences the smoothness of the fit. The smoothing.method provides for direct and logarithmic smoothing. The method of Huang (2001) for correcting the fitted spline for estimation bias at the end-points will be applied when fitting using a natural cubic smoothing spline if correctBoundaries is TRUE.

The derivatives of the fitted spline can also be obtained, and the Absolute and Relative Growth Rates ( AGR and RGR) computed using them, provided correctBoundaries is FALSE. Otherwise, growth rates can be obtained by difference using byIndv4Times_GRsDiff.

The handling of missing values in the observations is controlled via na.x.action and na.y.action. If there are not at least four distinct, nonmissing x-values, a warning is issued and all smoothed values and derivatives are set to NA.

The function probeSmooths can be used to investgate the effect the smoothing parameters (smoothing.method and df or lambda) on the smooth that results.

Usage

smoothSpline(data, response, response.smoothed = NULL, x, 
             smoothing.method = "direct", 
             spline.type = "NCSS", df = NULL, lambda = NULL, 
             npspline.segments = NULL, correctBoundaries = FALSE, 
             rates = NULL, suffices.rates = NULL, sep.rates = ".", 
             extra.derivs = NULL, suffices.extra.derivs=NULL, 
             na.x.action = "exclude", na.y.action = "trimx", ...)

Value

A list with two components named predictions and

fit.spline.

The predictions component is a data.frame containing x

and the fitted smooth. The names of the columns will be the value of

x and the value of response.smoothed. The number of rows in the data.frame will be equal to the number of pairs that have neither a missing x or response and the order of codex will be the same as the order in data. If deriv is not NULL, columns containing the values of the derivative(s) will be added to the

data.frame; the name each of these columns will be the value of

response.smoothed with .dvf appended, where f is the order of the derivative, or the value of response.smoothed and the corresponding element of suffices.deriv appended. If RGR is not NULL, the RGR is calculated as the ratio of value of the first derivative of the fitted spline and the fitted value for the spline.

The fit.spline component is a list with components

x:: the distinct x values in increasing order;
y:: the fitted values, with boundary values possibly corrected, and corresponding to x;
lev:: leverages, the diagonal values of the smoother matrix (NCSS only);
lambda:: the value of lambda (corresponding to spar for NCSS - see smooth.spline);
df:: the efective degrees of freedom;
npspline.segments:: the number of equally spaced segments used for smoothing method set to PS;
uncorrected.fit:: the object returned by smooth.spline for smoothing method set to NCSS or by JOPS::psNormal for PS.

Arguments

data: A data.frame containing the column to be smoothed.
response: A character giving the name of the column in data that is to be smoothed.
response.smoothed: A character specifying the name of the column containing the values of the smoothed response variable, corresponding to response. If response.smoothed is NULL, then response.smoothed is set to the response to which is added the prefix s.
x: A character giving the name of the column in data that contains the values of the predictor variable.
smoothing.method: A character giving the smoothing method to use. The two possibilites are (i) "direct", for directly smoothing the observed response, and (ii) "logarithmic", for smoothing the log-transformed response and then back-transforming by taking the exponentional of the fitted values.
spline.type: A character giving the type of spline to use. Currently, the possibilites are (i) "NCSS", for natural cubic smoothing splines, and (ii) "PS", for P-splines.
df: A numeric specifying, for natural cubic smoothing splines (NCSS), the desired equivalent number of degrees of freedom of the smooth (trace of the smoother matrix). Lower values result in more smoothing. If df = NULL, the amount of smoothing can be controlled by setting lambda. If both df and lambda are NULL, smoothing is controlled by the default arguments for smooth.spline, and any that you supply via the ellipsis (...) argument.
lambda: A numeric specifying the positive penalty to apply. The amount of smoothing decreases as lamda decreases.
npspline.segments: A numeric specifying, for P-splines (PS), the number of equally spaced segments between min(x) and max(x), excluding missing values, to use in constructing the B-spline basis for the spline fitting. If npspline.segments is NULL, npspline.segments is set to the maximum of 10 and ceiling((nrow(data)-1)/2) i.e. there will be at least 10 segments and, for more than 22 x values, there will be half as many segments as there are x values. The amount of smoothing decreases as npspline.segments increases.
correctBoundaries: A logical indicating whether the fitted spline values are to have the method of Huang (2001) applied to them to correct for estimation bias at the end-points. Note that spline.type must be NCSS and lambda and deriv must be NULL for correctBoundaries to be set to TRUE.
rates: A character giving the growth rates that are to be calculated using derivative. It should be a combination of one or more of "AGR", "PGR" and "RGR". If NULL, then growth rates are not computed.
suffices.rates: A character giving the characters to be appended to the names of the responses to provide the names of the columns containing the calculated growth rates. The order of the suffices in suffices.rates should correspond to the order of the elements of which.rates. If NULL, the values of rates are used.
sep.rates: A character giving the character(s) to be used to separate the suffices.rates value from a response value in constructing the name for a new rate. For no separator, set to "".
extra.derivs: A numeric specifying one or more orders of derivatives that are required, in addition to any required for calculating the growth rates. When rates.method is derivatives, these can be derivatives other than the first. Otherwise, any derivatives can be specified.
suffices.extra.derivs: A character giving the characters to be appended to response.method to construct the names of the derivatives. If NULL and the derivatives are to be retained, then .dv followed by the order of the derivative is appended to response.method

na.x.action: A character string that specifies the action to be taken when values of x are NA. The possible values are fail, exclude or omit. For exclude and omit, predictions and derivatives will only be obtained for nonmissing values of x. The difference between these two codes is that for exclude the returned data.frame will have as many rows as data, the missing values have been incorporated.
na.y.action: A character string that specifies the action to be taken when values of y, or the response, are NA. The possible values are fail, exclude, omit, allx, trimx, ltrimx or rtrimx. For all options, except fail, missing values in y will be removed before smoothing. For exclude and omit, predictions and derivatives will be obtained only for nonmissing values of x that do not have missing y values. Again, the difference between these two is that, only for exclude will the missing values be incorporated into the returned data.frame. For allx, predictions and derivatives will be obtained for all nonmissing x. For trimx, they will be obtained for all nonmissing x between the first and last nonmissing y values that have been ordered for x; for ltrimx and utrimx either the lower or upper missing y values, respectively, are trimmed.
...: allows for arguments to be passed to smooth.spline.

Author

Chris Brien

References

Eilers, P.H.C and Marx, B.D. (2021) Practical smoothing: the joys of P-splines. Cambridge University Press, Cambridge.

Huang, C. (2001) Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.

Examples

Run this code

data(exampleData)
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA", 
                    x="xDAP", df = 4,
                    rates = c("AGR", "RGR"))
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA", 
                    x="xDAP", df = 4,
                    rates = "AGR", suffices.rates = "AGRdv", 
                    extra.derivs =  2, suffices.extra.derivs = "Acc")
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA", 
                    x="xDAP", 
                    spline.type = "PS", lambda = 0.1, npspline.segments = 10, 
                    rates = "AGR", suffices.rates = "AGRdv", 
                    extra.derivs =  2, suffices.extra.derivs = "Acc")
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA", 
                    x="xDAP", df = 4,
                    rates = "AGR", suffices.rates = "AGRdv")

Run the code above in your browser using DataLab