data.frame
in long format, computes,
for a single set of smoothing parameters, smooths of the
response, possibly along with growth rates calculated from the
smooths.Uses smoothSpline
to fit a spline to the values
of response
for each individual
and stores the fitted
values in data
. The degree of smoothing is controlled by the
tuning parameters df
and lambda
, related to the
penalty, and by npspline.segments
. The smoothing.method
provides for direct
and logarithmic
smoothing.
The Absolute and Relative Growth Rates ( AGR and RGR) can be computed
either using the first derivatives of the splines or by differencing
the smooths. If using the first derivative to obtain growth rates,
correctBoundaries
must be FALSE
. Derivatives other than the
first derivative can also be produced. The function
byIndv4Times_GRsDiff
is used to obtain growth rates by
differencing.
The handling of missing values in the observations is controlled via
na.x.action
and na.y.action
. If there are not
at least four distinct, nonmissing x-values, a warning is issued and
all smoothed values and derivatives are set to NA
.
The function probeSmooths
can be used to investgate the effect
the smoothing parameters
(smoothing.method
, df
or
lambda
) on the smooth that results.
byIndv4Times_SplinesGRs(data, response, response.smoothed = NULL,
individuals = "Snapshot.ID.Tag", times,
smoothing.method = "direct", smoothing.segments = NULL,
spline.type = "NCSS", df=NULL, lambda = NULL,
npspline.segments = NULL,
correctBoundaries = FALSE,
rates.method = "differences",
which.rates = c("AGR","RGR"),
suffices.rates = NULL, sep.rates = ".",
avail.times.diffs = FALSE, ntimes2span = 2,
extra.derivs = NULL, suffices.extra.derivs=NULL,
sep.levels = ".",
na.x.action="exclude", na.y.action = "trimx", ...)
A data.frame
containing data
to which has been
added a column with the fitted smooth, the name of the column being
the value of response.smoothed
. If rates.method
is
not none
, columns for the growth rates listed in
which.rates
will be added to data
; the names each of
these columns will be the value of response.smoothed
with
the elements of which.rates
appended.
When rates.method
is derivatives
and
smoothing.method
is direct
, the AGR
is obtained
from the first derivative of the spline for each value of times
and the RGR
is calculated as the AGR
divided by the
value of the response.smoothed
for the corresponding time.
When rates.method
is derivatives
and
smoothing.method
is logarithmic
, the RGR
is
obtained from the first derivative of the spline and the AGR
is calculated as the RGR
multiplied by the corresponding
value of the response.smoothed
.
If extra.derivs
is not NULL
, the values for the
nominated derivatives will also be added to data
; the names
each of these columns will be the value of response.smoothed
with .dvf
appended, where f
is the order of the
derivative, or the value of response.smoothed
with the corresponding element of suffices.deriv
appended.
Any pre-existing smoothed and growth rate columns in data
will be
replaced. The ordering of the data.frame
for the times
values will be preserved as far as is possible; the main difficulty
is with the handling of missing values by the function merge
.
Thus, if missing values in times
are retained, they will occur at
the bottom of each subset of individuals
and the order will be
problematic when there are missing values in y
and
na.y.action
is set to omit
.
A data.frame
containing the column to be smoothed.
A character
giving the name of the column in
data
that is to be smoothed.
A character
specifying the name of the column
containing the values of the smoothed response variable, corresponding
to response
. If response.smoothed
is NULL
, then
response.smoothed
is set to the response
to which is added
the prefix s
.
A character
giving the name(s) of the
factor
(s) that define the subsets of response
that correspond to the response
values for an individual
(e.g. plant, pot, cart, plot or unit) that are to be smoothed
separately. If the columns corresponding to individuals
are
not factor
(s) then they will be coerced to
factor
(s). The subsets are formed
using split
.
A character
giving the name of the column in
data
containing the times at which the data was
collected, either as a numeric
, factor
, or
character
. It will be used as the values of the predictor
variable to be supplied to smooth.spline
and in
calculating growth rates. If a factor
or
character
, the values should be numerics stored as characters.
A character
giving the smoothing method
to use. The two possibilites are (i) "direct"
, for directly
smoothing the observed response
, and (ii) "logarithmic"
, for
smoothing the log
-transformed response
and then
back-transforming by taking the exponentional of the fitted values.
A named list
, each of whose components
is a numeric pair specifying the first and last values of an
times
-interval whose data is to be subjected as an entity to smoothing
using splines. The separate smooths will be combined to form a whole
smooth for each individual. If get.rates
is TRUE
,
rates.method
is differences
and ntimes2span
is 2,
the smoothed growth rates will be computed over the set of segments;
otherwise, they will be computed within segments.
If smoothing.segments
is NULL
, the data is not
segmented for smoothing.
A character
giving the type of spline
to use. Currently, the possibilites are (i) "NCSS"
, for natural
cubic smoothing splines, and (ii) "PS"
, for P-splines.
A numeric
specifying, for natural cubic smoothing splines
(NCSS
), the desired equivalent number of degrees of freedom of the
smooth (trace of the smoother matrix). Lower values result in more smoothing.
If df = NULL
, the amount of smoothing can be controlled by setting
lambda
. If both df
and lambda
are NULL
, smoothing
is controlled by the default arguments for smooth.spline
, and any
that you supply via the ellipsis (...) argument.
A numeric
specifying the positive penalty to apply.
The amount of smoothing decreases as lamda
decreases.
A numeric
specifying, for P-splines (PS
),
the number of equally spaced segments between min(times)
and max(times)
,
excluding missing values, to use in constructing the B-spline basis for the
spline fitting. If npspline.segments
is NULL, npspline.segments
is set to the maximum of 10 and ceiling((nrow(data)-1)/2)
i.e. there will
be at least 10 segments and, for more than 22 times
values, there will be
half as many segments as there are times
values. The amount of smoothing
decreases as npspline.segments
increases. When the data has been
segmented for smoothing (smoothing.segments
is not NULL
),
an npspline.segments
value can be supplied for each segment.
A logical
indicating whether the fitted spline
values are to have the method of Huang (2001) applied
to them to correct for estimation bias at the end-points. Note that
spline.type
must be NCSS
and lambda
and deriv
must be NULL
for correctBoundaries
to be set to TRUE
.
A character
specifying the method to use in
calculating the growth rates. The possibilities are
none
, differences
and derivatives
.
A character
giving the growth rates that are
to be calculated. It should be a combination of one or more of
"AGR"
, "PGR"
and "RGR"
.
A character
giving the characters to be
appended to the names of the responses to provide the names of the
columns containing the calculated growth rates. The order of the
suffices in suffices.rates
should correspond to the order
of the elements of which.rates
. If NULL
, the values
of which.rates
are used.
A character
giving the character(s) to be used
to separate the suffices.rates
value from a response
value in constructing the name for a new rate. For no separator,
set to ""
.
A logical
indicating whether there is an
appropriate column of times
diffserences that can be used as
the denominator in computing the growth rates. If TRUE
, it will
be assumed that the name of the column is the value of times
with .diffs
appended. If FALSE
, a column, whose
column name will be the value of times
with .diffs
appended, will be formed and saved in the result, overwriting any
existing columns with the constructed name in data
. It will
be calculated using the values of times in data
.
A numeric
giving the number of values in
times
to span in calculating growth rates by differencing.
Each growth rate is calculated as the difference in the values of
one of the responses
for pairs of times
values that
are spanned by ntimes2span
times
values divided by
the difference between this pair of times
values. For
ntimes2span
set to 2, a growth rate is the
difference between consecutive pairs of values of one of the
responses
divided by the difference between consecutive
pairs of times
values.
A numeric
specifying one or more orders of derivatives
that are required, in addition to any required for calculating the growth
rates. When rates.method
is derivatives
, these can be
derivatives other than the first. Otherwise, any derivatives can be
specified.
A character
giving the characters to be
appended to response.method
to construct the names of the derivatives.
If NULL
and the derivatives are to be retained, then .dv
followed by the order of the derivative is appended to
response.method
.
A character
giving the separator to use when the
levels of individuals
are combined. This is needed to avoid
using a character
that occurs in a factor
to delimit
levels when the levels of individuals
are combined to identify
subsets.
A character
string that specifies the action to
be taken when values of x
, or the times
, are NA
.
The possible values are fail
, exclude
or omit
.
For exclude
and omit
, predictions and derivatives
will only be obtained for nonmissing values of x
.
The difference between these two codes is that for exclude
the returned
data.frame
will have as many rows as data
, the
missing values have been incorporated.
A character
string that specifies the action to
be taken when values of y
, or the response
, are
NA
. The possible values are fail
, exclude
,
omit
, allx
, trimx
, ltrimx
or
rtrimx
. For all options, except fail
, missing
values in y
will be removed before smoothing.
For exclude
and omit
, predictions
and derivatives will be obtained only for nonmissing values of
x
that do not have missing y
values. Again, the
difference between these two is that, only for exclude
will the missing values be incorporated into the
returned data.frame
. For allx
, predictions and
derivatives will be obtained for all nonmissing x
.
For trimx
, they will be obtained for all nonmissing
x
between the first and last nonmissing y
values
that have been ordered for x
; for ltrimx
and
utrimx
either the lower or upper missing y
values, respectively, are trimmed.
allows for arguments to be passed to smooth.spline
.
Chris Brien
Eilers, P.H.C and Marx, B.D. (2021) Practical smoothing: the joys of P-splines. Cambridge University Press, Cambridge.
Huang, C. (2001) Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.
smoothSpline
, probeSmooths
, byIndv4Times_GRsDiff
,
smooth.spline
, predict.smooth.spline
,
split
data(exampleData)
#smoothing with growth rates calculated using derivates
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat,
response="PSA", response.smoothed = "sPSA",
times="DAP",
df = 4, rates.method = "deriv",
suffices.rates = c("AGRdv", "RGRdv"))
#Use P-splines
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat,
response="PSA", response.smoothed = "sPSA",
individuals = "Snapshot.ID.Tag", times="DAP",
spline.type = "PS", lambda = 0.1,
npspline.segments = 10,
rates.method = "deriv",
suffices.rates = c("AGRdv", "RGRdv"))
#with segmented smoothing and no growth rates
longi.dat <- byIndv4Times_SplinesGRs(data = longi.dat,
response="PSA", response.smoothed = "sPSA",
individuals = "Snapshot.ID.Tag", times="DAP",
smoothing.segments = list(c(28,34), c(35,42)),
df = 5, rates.method = "none")
Run the code above in your browser using DataLab