splitSplines: Adds the fits, and optionally growth rates computed from derivatives, after fitting splines to a response for an individual stored in a `data.frame` in long format

Description

Uses fitSpline to fit a spline to a subset of the values of response and stores the fitted values in data. The subsets are those values with the same levels combinations of the factors listed in individuals. The degree of smoothing is controlled by the tuning parameters df and lambda, related to the penalty, and by npspline.segments. The smoothing.method provides for direct and logarithmic smoothing.

The derivatives of the fitted spline can also be obtained, and the Absolute and Relative Growth Rates ( AGR and RGR) computed using them, provided correctBoundaries is FALSE. Otherwise, growth rates can be obtained by difference using splitContGRdiff.

The handling of missing values in the observations is controlled via na.x.action and na.y.action. If there are not at least four distinct, nonmissing x-values, a warning is issued and all smoothed values and derivatives are set to NA.

The function probeSmoothing can be used to investgate the effect the smoothing parameters (smoothing.method, df or lambda) on the smooth that results.

Note: this function is soft deprecated and may be removed in future versions.
Use byIndv4Times_SplinesGRs.

Usage

splitSplines(data, response, response.smoothed = NULL, x, 
             individuals = "Snapshot.ID.Tag", INDICES = NULL,
             smoothing.method = "direct", smoothing.segments = NULL, 
             spline.type = "NCSS", df=NULL, lambda = NULL, 
             npspline.segments = NULL, 
             correctBoundaries = FALSE, 
             deriv = NULL, suffices.deriv = NULL, extra.rate = NULL, 
             sep = ".", 
             na.x.action="exclude", na.y.action = "exclude", ...)

Value

A data.frame containing data to which has been added a column with the fitted smooth, the name of the column being

response.smoothed. If deriv is not NULL, columns containing the values of the derivative(s) will be added to data; the name each of these columns will be the value of

response.smoothed with .dvf appended, where f

is the order of the derivative, or the value of response.smoothed

with the corresponding element of suffices.deriv appended. If RGR is not NULL, the RGR is calculated as the ratio of value of the first derivative of the fitted spline and the fitted value for the spline. Any pre-existing smoothed and derivative columns in data will be replaced. The ordering of the data.frame for the x

values will be preserved as far as is possible; the main difficulty is with the handling of missing values by the function merge. Thus, if missing values in x are retained, they will occur at the bottom of each subset of individuals and the order will be problematic when there are missing values in y and

na.y.action is set to omit.

Arguments

data: A data.frame containing the column to be smoothed.
response: A character giving the name of the column in data that is to be smoothed.
response.smoothed: A character specifying the name of the column containing the values of the smoothed response variable, corresponding to response. If response.smoothed is NULL, then response.smoothed is set to the response to which .smooth is added.
x: A character giving the name of the column in data that contains the values of the predictor variable.
individuals: A character giving the name(s) of the factor(s) that define the subsets of response that correspond to the response values for an individual (e.g. plant, pot, cart, plot or unit) that are to be smoothed separately. If the columns corresponding to individuals are not factor(s) then they will be coerced to factor(s). The subsets are formed using split.
INDICES: A pseudonym for individuals.
smoothing.method: A character giving the smoothing method to use. The two possibilites are (i) "direct", for directly smoothing the observed response, and (ii) "logarithmic", for smoothing the log-transformed response and then back-transforming by taking the exponentional of the fitted values.
smoothing.segments: A named list, each of whose components is a numeric pair specifying the first and last values of an x-interval whose data is to be subjected as an entity to smoothing using splines. The separate smooths will be combined to form a whole smooth for each individual. If smoothing.segments is NULL, the data is not segmented for smoothing.
spline.type: A character giving the type of spline to use. Currently, the possibilites are (i) "NCSS", for natural cubic smoothing splines, and (ii) "PS", for P-splines.
df: A numeric specifying, for natural cubic smoothing splines (NCSS), the desired equivalent number of degrees of freedom of the smooth (trace of the smoother matrix). Lower values result in more smoothing. If df = NULL, the amount of smoothing can be controlled by setting lambda. If both df and lambda are NULL, smoothing is controlled by the default arguments for smooth.spline, and any that you supply via the ellipsis (...) argument.
lambda: A numeric specifying the positive penalty to apply. The amount of smoothing decreases as lamda decreases.
npspline.segments: A numeric specifying, for P-splines (PS), the number of equally spaced segments between min(x) and max(x), excluding missing values, to use in constructing the B-spline basis for the spline fitting. If npspline.segments is NULL, npspline.segments is set to the maximum of 10 and ceiling((nrow(data)-1)/2) i.e. there will be at least 10 segments and, for more than 22 x values, there will be half as many segments as there are x values. The amount of smoothing decreases as npspline.segments increases. When the data has been segmented for smoothing (smoothing.segments is not NULL), an npspline.segments value can be supplied for each segment.
correctBoundaries: A logical indicating whether the fitted spline values are to have the method of Huang (2001) applied to them to correct for estimation bias at the end-points. Note that spline.type must be NCSS and lambda and deriv must be NULL for correctBoundaries to be set to TRUE.
deriv: A numeric specifying one or more orders of derivatives that are required.
suffices.deriv: A character giving the characters to be appended to the names of the derivatives. If NULL and the derivative is to be retained then smooth.dv is appended.
extra.rate: A named character nominating a single growth rate (AGR or RGR) to be computed using the first derivative, which one being dependent on the smoothing.method. The name of this element will used as a suffix to be appended to the response when naming the resulting growth rate (see Examples). If unamed, AGR or RGR will be used, as appropriate. Note that, for the smoothing.method set to direct, the first derivative is the AGR and so extra.rate must be set to RGR, which is computed as the AGR / smoothed response. For the smoothing.method set to logarithmic, the first derivative is the RGR and so extra.rate must be set to AGR, which is computed as the RGR * smoothed response. Make sure that deriv includes one so that the first derivative is available for calculating the extra.rate.
sep: A character giving the separator to use when the levels of individuals are combined. This is needed to avoid using a character that occurs in a factor to delimit levels when the levels of individuals are combined to identify subsets.
na.x.action: A character string that specifies the action to be taken when values of x are NA. The possible values are fail, exclude or omit. For exclude and omit, predictions and derivatives will only be obtained for nonmissing values of x. The difference between these two codes is that for exclude the returned data.frame will have as many rows as data, the missing values have been incorporated.
na.y.action: A character string that specifies the action to be taken when values of y, or the response, are NA. The possible values are fail, exclude, omit, allx, trimx, ltrimx or rtrimx. For all options, except fail, missing values in y will be removed before smoothing. For exclude and omit, predictions and derivatives will be obtained only for nonmissing values of x that do not have missing y values. Again, the difference between these two is that, only for exclude will the missing values be incorporated into the returned data.frame. For allx, predictions and derivatives will be obtained for all nonmissing x. For trimx, they will be obtained for all nonmissing x between the first and last nonmissing y values that have been ordered for x; for ltrimx and utrimx either the lower or upper missing y values, respectively, are trimmed.
...: allows for arguments to be passed to smooth.spline.

Author

Chris Brien

References

Eilers, P.H.C and Marx, B.D. (2021) Practical smoothing: the joys of P-splines. Cambridge University Press, Cambridge.

Huang, C. (2001) Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.

Examples

Run this code

data(exampleData)
#smoothing with growth rates calculated using derivates
longi.dat <- splitSplines(longi.dat, response="PSA", x="xDAP", 
                          individuals = "Snapshot.ID.Tag", 
                          df = 4, deriv=1, suffices.deriv="AGRdv", 
                          extra.rate = c(RGRdv = "RGR"))
#Use P-splines
longi.dat <- splitSplines(longi.dat, response="PSA", x="xDAP", 
                          individuals = "Snapshot.ID.Tag", 
                          spline.type = "PS", lambda = 0.1, npspline.segments = 10, 
                          deriv=1, suffices.deriv="AGRdv", 
                          extra.rate = c(RGRdv = "RGR"))
#with segmented smoothing
longi.dat <- splitSplines(longi.dat, response="PSA", x="xDAP", 
                          individuals = "Snapshot.ID.Tag", 
                          smoothing.segments = list(c(28,34), c(35,42)), df = 5)

Run the code above in your browser using DataLab