byIndv4Times_SplinesGRs: For a response in a `data.frame` in long format, computes, for a single set of smoothing parameters, smooths of the response, possibly along with growth rates calculated from the smooths.

Description

Uses smoothSpline to fit a spline to the values of response for each individual and stores the fitted values in data. The degree of smoothing is controlled by the tuning parameters df and lambda, related to the penalty, and by npspline.segments. The smoothing.method provides for direct and logarithmic smoothing.

The Absolute and Relative Growth Rates ( AGR and RGR) can be computed either using the first derivatives of the splines or by differencing the smooths. If using the first derivative to obtain growth rates, correctBoundaries must be FALSE. Derivatives other than the first derivative can also be produced. The function byIndv4Times_GRsDiff is used to obtain growth rates by differencing.

The handling of missing values in the observations is controlled via na.x.action and na.y.action. If there are not at least four distinct, nonmissing x-values, a warning is issued and all smoothed values and derivatives are set to NA.

The function probeSmoothing can be used to investgate the effect the smoothing parameters
(smoothing.method, df or lambda) on the smooth that results.

Usage

byIndv4Times_SplinesGRs(data, response, response.smoothed = NULL, 
                        individuals = "Snapshot.ID.Tag", times, 
                        smoothing.method = "direct", smoothing.segments = NULL, 
                        spline.type = "NCSS", df=NULL, lambda = NULL, 
                        npspline.segments = NULL, 
                        correctBoundaries = FALSE, 
                        rates.method = "differences", 
                        which.rates = c("AGR","RGR"), 
                        suffices.rates = NULL, sep.rates = ".", 
                        avail.times.diffs = FALSE, ntimes2span = 2, 
                        extra.derivs = NULL, suffices.extra.derivs=NULL, 
                        sep.levels = ".", 
                        na.x.action="exclude", na.y.action = "trimx", ...)

Value

A data.frame containing data to which has been added a column with the fitted smooth, the name of the column being the value of response.smoothed. If rates.method is not none, columns for the growth rates listed in

which.rates will be added to data; the names each of these columns will be the value of response.smoothed with the elements of which.rates appended.

When rates.method is derivatives and

smoothing.method is direct, the AGR is obtained from the first derivative of the spline for each value of times

and the RGR is calculated as the AGR divided by the value of the response.smoothed for the corresponding time. When rates.method is derivatives and

smoothing.method is logarithmic, the RGR is obtained from the first derivative of the spline and the AGR

is calculated as the RGR multiplied by the corresponding value of the response.smoothed.

If extra.derivs is not NULL, the values for the nominated derivatives will also be added to data; the names each of these columns will be the value of response.smoothed

with .dvf appended, where f is the order of the derivative, or the value of response.smoothed

with the corresponding element of suffices.deriv appended.

Any pre-existing smoothed and growth rate columns in data will be replaced. The ordering of the data.frame for the times

values will be preserved as far as is possible; the main difficulty is with the handling of missing values by the function merge. Thus, if missing values in times are retained, they will occur at the bottom of each subset of individuals and the order will be problematic when there are missing values in y and

na.y.action is set to omit.

Arguments

data: A data.frame containing the column to be smoothed.
response: A character giving the name of the column in data that is to be smoothed.
response.smoothed: A character specifying the name of the column containing the values of the smoothed response variable, corresponding to response. If response.smoothed is NULL, then response.smoothed is set to the response to which is added the prefix s.
individuals: A character giving the name(s) of the factor(s) that define the subsets of response that correspond to the response values for an individual (e.g. plant, pot, cart, plot or unit) that are to be smoothed separately. If the columns corresponding to individuals are not factor(s) then they will be coerced to factor(s). The subsets are formed using split.
times: A character giving the name of the column in data containing the times at which the data was collected, either as a numeric, factor, or character. It will be used as the values of the predictor variable to be supplied to smooth.spline and in calculating growth rates. If a factor or character, the values should be numerics stored as characters.
smoothing.method: A character giving the smoothing method to use. The two possibilites are (i) "direct", for directly smoothing the observed response, and (ii) "logarithmic", for smoothing the log-transformed response and then back-transforming by taking the exponentional of the fitted values.
smoothing.segments: A named list, each of whose components is a numeric pair specifying the first and last values of an times-interval whose data is to be subjected as an entity to smoothing using splines. The separate smooths will be combined to form a whole smooth for each individual. If get.rates is TRUE, rates.method is differences and ntimes2span is 2, the smoothed growth rates will be computed over the set of segments; otherwise, they will be computed within segments. If smoothing.segments is NULL, the data is not segmented for smoothing.
spline.type: A character giving the type of spline to use. Currently, the possibilites are (i) "NCSS", for natural cubic smoothing splines, and (ii) "PS", for P-splines.
df: A numeric specifying, for natural cubic smoothing splines (NCSS), the desired equivalent number of degrees of freedom of the smooth (trace of the smoother matrix). Lower values result in more smoothing. If df = NULL, the amount of smoothing can be controlled by setting lambda. If both df and lambda are NULL, smoothing is controlled by the default arguments for smooth.spline, and any that you supply via the ellipsis (...) argument.
lambda: A numeric specifying the positive penalty to apply. The amount of smoothing decreases as lamda decreases.
npspline.segments: A numeric specifying, for P-splines (PS), the number of equally spaced segments between min(times) and max(times), excluding missing values, to use in constructing the B-spline basis for the spline fitting. If npspline.segments is NULL, npspline.segments is set to the maximum of 10 and ceiling((nrow(data)-1)/2) i.e. there will be at least 10 segments and, for more than 22 times values, there will be half as many segments as there are times values. The amount of smoothing decreases as npspline.segments increases. When the data has been segmented for smoothing (smoothing.segments is not NULL), an npspline.segments value can be supplied for each segment.
correctBoundaries: A logical indicating whether the fitted spline values are to have the method of Huang (2001) applied to them to correct for estimation bias at the end-points. Note that spline.type must be NCSS and lambda and deriv must be NULL for correctBoundaries to be set to TRUE.
rates.method: A character specifying the method to use in calculating the growth rates. The possibilities are none, differences and derivatives.
which.rates: A character giving the growth rates that are to be calculated. It should be a combination of one or more of "AGR", "PGR" and "RGR".
suffices.rates: A character giving the characters to be appended to the names of the responses to provide the names of the columns containing the calculated growth rates. The order of the suffices in suffices.rates should correspond to the order of the elements of which.rates. If NULL, the values of which.rates are used.
sep.rates: A character giving the character(s) to be used to separate the suffices.rates value from a response value in constructing the name for a new rate. For no separator, set to "".
avail.times.diffs: A logical indicating whether there is an appropriate column of times diffserences that can be used as the denominator in computing the growth rates. If TRUE, it will be assumed that the name of the column is the value of times with .diffs appended. If FALSE, a column, whose column name will be the value of times with .diffs appended, will be formed and saved in the result, overwriting any existing columns with the constructed name in data. It will be calculated using the values of times in data.
ntimes2span: A numeric giving the number of values in times to span in calculating growth rates by differencing. Each growth rate is calculated as the difference in the values of one of the responses for pairs of times values that are spanned by ntimes2span times values divided by the difference between this pair of times values. For ntimes2span set to 2, a growth rate is the difference between consecutive pairs of values of one of the responses divided by the difference between consecutive pairs of times values.
extra.derivs: A numeric specifying one or more orders of derivatives that are required, in addition to any required for calculating the growth rates. When rates.method is derivatives, these can be derivatives other than the first. Otherwise, any derivatives can be specified.
suffices.extra.derivs: A character giving the characters to be appended to response.method to construct the names of the derivatives. If NULL and the derivatives are to be retained, then .dv followed by the order of the derivative is appended to response.method

sep.levels: A character giving the separator to use when the levels of individuals are combined. This is needed to avoid using a character that occurs in a factor to delimit levels when the levels of individuals are combined to identify subsets.
na.x.action: A character string that specifies the action to be taken when values of x, or the times, are NA. The possible values are fail, exclude or omit. For exclude and omit, predictions and derivatives will only be obtained for nonmissing values of x. The difference between these two codes is that for exclude the returned data.frame will have as many rows as data, the missing values have been incorporated.
na.y.action: A character string that specifies the action to be taken when values of y, or the response, are NA. The possible values are fail, exclude, omit, allx, trimx, ltrimx or rtrimx. For all options, except fail, missing values in y will be removed before smoothing. For exclude and omit, predictions and derivatives will be obtained only for nonmissing values of x that do not have missing y values. Again, the difference between these two is that, only for exclude will the missing values be incorporated into the returned data.frame. For allx, predictions and derivatives will be obtained for all nonmissing x. For trimx, they will be obtained for all nonmissing x between the first and last nonmissing y values that have been ordered for x; for ltrimx and utrimx either the lower or upper missing y values, respectively, are trimmed.
...: allows for arguments to be passed to smooth.spline.

Author

Chris Brien

References

Eilers, P.H.C and Marx, B.D. (2021) Practical smoothing: the joys of P-splines. Cambridge University Press, Cambridge.

Huang, C. (2001) Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.

Examples

Run this code

data(exampleData)
#smoothing with growth rates calculated using derivates
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat, 
                                     response="PSA", response.smoothed = "sPSA", 
                                     times="DAP", 
                                     df = 4, rates.method = "deriv", 
                                     suffices.rates = c("AGRdv", "RGRdv"))
#Use P-splines
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat, 
                                     response="PSA", response.smoothed = "sPSA", 
                                     individuals = "Snapshot.ID.Tag", times="DAP", 
                                     spline.type = "PS", lambda = 0.1, 
                                     npspline.segments = 10, 
                                     rates.method = "deriv", 
                                     suffices.rates = c("AGRdv", "RGRdv"))
#with segmented smoothing and no growth rates
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat, 
                                     response="PSA", response.smoothed = "sPSA", 
                                     individuals = "Snapshot.ID.Tag", times="DAP", 
                                     smoothing.segments = list(c(28,34), c(35,42)), 
                                     df = 5, rates.method = "none")

Run the code above in your browser using DataLab