response
and an x
in a data.frame
,
optionally computing growth rates using derivatives.Uses smooth.spline
to fit a natural cubic smoothing spline or JOPS
to fit a
P-spline to all the values of response
stored in data
.
The amount of smoothing can be controlled by tuning parameters, these being
related to the penalty. For a natural cubic smoothing spline, these are
df
or lambda
and, for a P-spline, it is lambda
.
For a P-spline, npspline.segments
also influences the smoothness of the fit.
The smoothing.method
provides for direct
and logarithmic
smoothing. The method of Huang (2001) for correcting the fitted spline for
estimation bias at the end-points will be applied when fitting using a
natural cubic smoothing spline if correctBoundaries
is TRUE
.
The derivatives of the fitted spline can also be obtained, and the
Absolute and Relative Growth Rates ( AGR and RGR) computed using them, provided
correctBoundaries
is FALSE
. Otherwise, growth rates can be
obtained by difference using byIndv4Times_GRsDiff
.
The handling of missing values in the observations is controlled via
na.x.action
and na.y.action
. If there are not
at least four distinct, nonmissing x-values, a warning is issued and
all smoothed values and derivatives are set to NA
.
The function probeSmooths
can be used to investgate the effect
the smoothing parameters (smoothing.method
and df
or
lambda
) on the smooth that results.
smoothSpline(data, response, response.smoothed = NULL, x,
smoothing.method = "direct",
spline.type = "NCSS", df = NULL, lambda = NULL,
npspline.segments = NULL, correctBoundaries = FALSE,
rates = NULL, suffices.rates = NULL, sep.rates = ".",
extra.derivs = NULL, suffices.extra.derivs=NULL,
na.x.action = "exclude", na.y.action = "trimx", ...)
A list
with two components named predictions
and
fit.spline
.
The predictions
component is a data.frame
containing x
and the fitted smooth. The names of the columns will be the value of
x
and the value of response.smoothed
. The number of rows in
the data.frame
will be equal to the number of pairs that have neither
a missing x
or response
and the order of x
will be the
same as the order in data
. If deriv
is not NULL
,
columns containing the values of the derivative(s) will be added to the
data.frame
; the name each of these columns will be the value of
response.smoothed
with .dvf
appended, where f
is the
order of the derivative, or the value of response.smoothed
and the
corresponding element of suffices.deriv
appended. If RGR
is
not NULL
, the RGR is calculated as the ratio of value of the first
derivative of the fitted spline and the fitted value for the spline.
The fit.spline
component is a list
with components
x
:the distinct x values in increasing order;
y
:the fitted values, with boundary values possibly corrected, and corresponding to x
;
lev
:leverages, the diagonal values of the smoother matrix (NCSS only);
lambda
:the value of lambda (corresponding to spar
for NCSS - see smooth.spline
);
df
:the efective degrees of freedom;
npspline.segments
:the number of equally spaced segments used for smoothing method set to PS
;
uncorrected.fit
:the object returned by smooth.spline
for smoothing method
set to NCSS
or by JOPS::psNormal
for PS
.
A data.frame
containing the column to be smoothed.
A character
giving the name of the column in
data
that is to be smoothed.
A character
specifying the name of the column
containing the values of the smoothed response variable, corresponding
to response
. If response.smoothed
is NULL
, then
response.smoothed
is set to the response
to which is added
the prefix s
.
A character
giving the name of the column in
data
that contains the values of the predictor variable.
A character
giving the smoothing method
to use. The two possibilites are (i) "direct"
, for directly
smoothing the observed response
, and (ii) "logarithmic"
, for
smoothing the log
-transformed response
and then
back-transforming by taking the exponentional of the fitted values.
A character
giving the type of spline
to use. Currently, the possibilites are (i) "NCSS"
, for natural
cubic smoothing splines, and (ii) "PS"
, for P-splines.
A numeric
specifying, for natural cubic smoothing splines
(NCSS
), the desired equivalent number of degrees of freedom of the
smooth (trace of the smoother matrix). Lower values result in more smoothing.
If df = NULL
, the amount of smoothing can be controlled by setting
lambda
. If both df
and lambda
are NULL
, smoothing
is controlled by the default arguments for smooth.spline
, and any
that you supply via the ellipsis (...) argument.
A numeric
specifying the positive penalty to apply.
The amount of smoothing decreases as lamda
decreases.
A numeric
specifying, for P-splines (PS
),
the number of equally spaced segments between min(x)
and max(x)
,
excluding missing values, to use in constructing the B-spline basis for the
spline fitting. If npspline.segments
is NULL, npspline.segments
is set to the maximum of 10 and ceiling((nrow(data)-1)/2)
i.e. there will
be at least 10 segments and, for more than 22 x
values, there will be
half as many segments as there are x
values. The amount of smoothing
decreases as npspline.segments
increases.
A logical
indicating whether the fitted spline
values are to have the method of Huang (2001) applied
to them to correct for estimation bias at the end-points. Note that
spline.type
must be NCSS
and lambda
and deriv
must be NULL
for correctBoundaries
to be set to TRUE
.
A character
giving the growth rates that are
to be calculated using derivative. It should be a combination of one or more
of "AGR"
, "PGR"
and "RGR"
. If NULL
, then
growth rates are not computed.
A character
giving the characters to be
appended to the names of the responses to provide the names of the
columns containing the calculated growth rates. The order of the
suffices in suffices.rates
should correspond to the order
of the elements of which.rates
. If NULL
, the values
of rates
are used.
A character
giving the character(s) to be used
to separate the suffices.rates
value from a response
value in constructing the name for a new rate. For no separator,
set to ""
.
A numeric
specifying one or more orders of derivatives
that are required, in addition to any required for calculating the growth
rates. When rates.method
is derivatives
, these can be
derivatives other than the first. Otherwise, any derivatives can be
specified.
A character
giving the characters to be
appended to response.method
to construct the names of the derivatives.
If NULL
and the derivatives are to be retained, then .dv
followed by the order of the derivative is appended to
response.method
.
A character
string that specifies the action to
be taken when values of x
are NA
. The possible
values are fail
, exclude
or omit
.
For exclude
and omit
, predictions and derivatives
will only be obtained for nonmissing values of x
.
The difference between these two codes is that for exclude
the returned data.frame
will have as many rows as
data
, the missing values have been incorporated.
A character
string that specifies the action to
be taken when values of y
, or the response
, are
NA
. The possible values are fail
, exclude
,
omit
, allx
, trimx
, ltrimx
or
rtrimx
. For all options, except fail
, missing
values in y
will be removed before smoothing.
For exclude
and omit
, predictions
and derivatives will be obtained only for nonmissing values of
x
that do not have missing y
values. Again, the
difference between these two is that, only for exclude
will the missing values be incorporated into the
returned data.frame
. For allx
, predictions and
derivatives will be obtained for all nonmissing x
.
For trimx
, they will be obtained for all nonmissing
x
between the first and last nonmissing y
values
that have been ordered for x
; for ltrimx
and
utrimx
either the lower or upper missing y
values, respectively, are trimmed.
allows for arguments to be passed to smooth.spline
.
Chris Brien
Eilers, P.H.C and Marx, B.D. (2021) Practical smoothing: the joys of P-splines. Cambridge University Press, Cambridge.
Huang, C. (2001) Boundary corrected cubic smoothing splines. Journal of Statistical Computation and Simulation, 70, 107-121.
byIndv4Times_SplinesGRs
, probeSmooths
,
byIndv4Times_GRsDiff
, smooth.spline
,
predict.smooth.spline
, JOPS.
data(exampleData)
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA",
x="xDAP", df = 4,
rates = c("AGR", "RGR"))
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA",
x="xDAP", df = 4,
rates = "AGR", suffices.rates = "AGRdv",
extra.derivs = 2, suffices.extra.derivs = "Acc")
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA",
x="xDAP",
spline.type = "PS", lambda = 0.1, npspline.segments = 10,
rates = "AGR", suffices.rates = "AGRdv",
extra.derivs = 2, suffices.extra.derivs = "Acc")
fit <- smoothSpline(longi.dat, response="PSA", response.smoothed = "sPSA",
x="xDAP", df = 4,
rates = "AGR", suffices.rates = "AGRdv")
Run the code above in your browser using DataLab