Performs iterative bias reduction using kernel, thin plate splines, Duchon splines or low rank splines. Missing values are not allowed. This function is not intended to be used directly.
ibr.fit(x, y, criterion="gcv", df=1.5, Kmin=1, Kmax=1e+06, smoother="k",
kernel="g", rank=NULL, control.par=list(), cv.options=list())
Returns a list including:
Vector of coefficients.
Vector of residuals.
Vector of fitted values.
The number of iterations used.
The initial effective degree of freedom of the pilot (or base) smoother.
The effective degree of freedom of the iterated bias reduction
smoother at the iter
iterations.
Vector of bandwith for each explanatory variable
The matched call
A list containing several components:
p
contains the number of explanatory variables and m
the order of the splines (if relevant), s
the power of weights, scaled
boolean which is TRUE
when explanatory variables are scaled, mean
mean of explanatory
variables if scaled=TRUE
, sd
standard deviation of
explanatory variables if scaled=TRUE
, critmethod
that indicates the method chosen
for criteria strict
,
rank
the rank of low rank splines if relevant,
criterion
the chosen criterion,
smoother
the chosen smoother,
kernel
the chosen kernel,
smoothobject
the smoothobject returned by
smoothCon,
exhaustive
a boolean which indicates if an exhaustive
search was chosen
Value
of the chosen criterion at the given iteration, NA
is
returned when aggregation of criteria is chosen (see component
criterion
of list control.par
). If the number of iterations
iter
is given by the user, NULL
is returned
Numeric vector giving all the optimal number of iterations selected by the chosen criteria.
either a list containing all the criteria evaluated on the
grid Kmin:Kmax
(along with the effective degree of freedom of the
smoother and the sigma squared on this grid) if an exhaustive search is chosen (see the
value of function
iterchoiceAe
or iterchoiceS1e
)
or all the values
of criteria at the given optimal iteration if a non exhaustive
search is chosen (see also exhaustive
component of list
control.par
).
A numeric matrix of explanatory variables, with n rows and p columns.
A numeric vector of variable to be explained of length n.
A vector of string. If the number of iterations
(iter
) is missing or
NULL
the number of iterations is chosen using the either one
criterion (the first
coordinate of criterion
) or several (see component
criterion
of argument list control.par
). The criteria available are GCV (default, "gcv"
),
AIC ("aic"
), corrected AIC ("aicc"
), BIC
("bic"
), gMDL ("gmdl"
), map ("map"
) or rmse
("rmse"
). The last two are designed for cross-validation.
A numeric vector of either length 1 or length equal to the
number of columns of x
. If smoother="k"
, it indicates
the desired effective degree of
freedom (trace) of the smoothing matrix for
each variable or for the initial smoother (see contr.sp$dftotal
); df
is repeated when the length of vector
df
is 1. If smoother="tps"
or smoother="ds"
, the
minimum df of splines is multiplied by df
. This argument is useless if
bandwidth
is supplied (non null).
The minimum number of bias correction iterations of the search grid considered by the model selection procedure for selecting the optimal number of iterations.
The maximum number of bias correction iterations of the search grid considered by the model selection procedure for selecting the optimal number of iterations.
Character string which allows to choose between thin plate
splines "tps"
, Duchon
splines "tps"
(see Duchon, 1977) or kernel ("k"
).
Character string which allows to choose between gaussian kernel
("g"
), Epanechnikov ("e"
), uniform ("u"
),
quartic ("q"
). The default (gaussian kernel) is strongly advised.
Numeric value that control the rank of low rank splines
(denoted as k
in mgcv package ; see also choose.k
for further details or gam for another smoothing approach with
reduced rank smoother.
A named list that control optional parameters. The
components are bandwidth
(default to NULL), iter
(default to NULL), really.big
(default to FALSE
),
dftobwitmax
(default to 1000), exhaustive
(default to
FALSE
),m
(default to NULL), ,s
(default to NULL),
dftotal
(default to FALSE
), accuracy
(default to
0.01), ddlmaxi
(default to 2n/3), fraction
(default
to c(100, 200, 500, 1000, 5000, 10^4, 5e+04, 1e+05, 5e+05,
1e+06)
), scale
(default to FALSE
),
criterion
(default to "strict"
) and
aggregfun
(default to 10^(floor(log10(x[2]))+2)).
bandwidth
: a vector of either length 1 or length equal to the
number of columns of x
. If smoother="k"
,
it indicates the bandwidth used for
each variable, bandwidth is repeated when the length of vector
bandwidth
is 1. If smoother="tps"
, it indicates the
amount of penalty (coefficient lambda).
The default (missing) indicates, for smoother="k"
, that
bandwidth for each variable is
chosen such that each univariate kernel
smoother (for each explanatory variable) has df
effective degrees of
freedom and for smoother="tps"
or smoother="ds"
that lambda is chosen such that
the df of the smoothing matrix is df
times the minimum df.
iter
: the number of iterations. If null or missing, an optimal number of
iterations is chosen from
the search grid (integer from Kmin
to Kmax
) to minimize the criterion
.
really.big
: a boolean: if TRUE
it overides the limitation
at 500 observations. Expect long computation times if TRUE
.
dftobwitmax
: When bandwidth is chosen by specifying the effective
degree
of freedom (see df
) a search is done by
uniroot
. This argument specifies the maximum number of iterations transmitted to uniroot
function.
exhaustive
: boolean, if TRUE
an exhaustive search of
optimal number of iteration on the grid Kmin:Kmax
is
performed. All criteria for all iterations in the same class (class
one: GCV, AIC, corrected AIC, BIC, gMDL ; class two : MAP, RMSE) are
returned in argument allcrit
. If FALSE
the minimum of
criterion is searched using optimize
between Kmin
and Kmax
.
m
: The order of derivatives for the penalty (for thin plate
splines it is the order). This integer m must verify
2m+2s/d>1, where d is the number of
explanatory variables. The default (for smoother="tps"
) is to
choose the order m as the first integer such that
2m/d>1, where d is the number of explanatory
variables. The default (for smoother="ds"
) is to choose
m=2 (p
seudo cubic splines).
s
: the power of weighting function. For thin plate splines
s is equal to 0. This real must be strictly smaller than d/2
(where d is the number of explanatory variables) and must
verify 2m+2s/d. To get pseudo-cubic splines (the default),
choose m=2 and s=(d-1)/2 (See Duchon, 1977).the order of thin plate splines. This integer m must verifies
2m/d>1, where d is the number of explanatory
variables.
dftotal
: a boolean wich indicates when FAlSE
that the
argument df
is the objective df for each univariate kernel (the
default) calculated for each explanatory variable or for the overall
(product) kernel, that is the base smoother (when TRUE
).
accuracy
: tolerance when searching bandwidths which lead to a
chosen overall intial df.
dfmaxi
: the maximum effective degree of freedom allowed for iterated
biased reduction smoother.
fraction
: the subdivision of interval Kmin
,Kmax
if non exhaustive search is performed (see also iterchoiceA
or iterchoiceS1
).
scale
: boolean. If TRUE
x
is scaled (using
scale
); default to FALSE
.
criterion
Character string. Possible choices are strict
,
aggregation
or recalc
. strict
allows to select the number of iterations according to
the first coordinate of argument criterion
.
aggregation
allows to select the number of iterations by applying the
function control.par$aggregfun
to the number of iterations
selected by all the criteria chosen in argument criterion
.
recalc
allows to select the number of iterations by first calculating the
optimal number of the second coordinate of argument
criterion
, then applying the function
control.par$aggregfun
(to add some number to
it) resulting in a new Kmax
and then doing the optimal selction
between Kmin
and this new Kmax
using the first coordinate of argument
criterion
.
; default to strict
.
aggregfun
function to be applied when
control.par$criterion
is either recalc
or
aggregation
.
A named list which controls the way to do cross
validation with component bwchange
,
ntest
, ntrain
, Kfold
, type
,
seed
, method
and npermut
. bwchange
is a boolean (default to FALSE
)
which indicates if bandwidth have to be recomputed each
time. ntest
is the number of observations in test set and
ntrain
is the number of observations in training set. Actually,
only one of these is needed the other can be NULL
or missing. Kfold
a boolean or an integer. If
Kfold
is TRUE
then the number of fold is deduced from
ntest
(or ntrain
). type
is a character string in
random
,timeseries
,consecutive
, interleaved
and give the type of segments. seed
controls the seed of
random generator. method
is either "inmemory"
or
"outmemory"
; "inmemory"
induces some calculations outside
the loop saving computational time but leading to an increase of the required
memory. npermut
is the number of random draws. If
cv.options
is list()
, then component ntest
is set to
floor(nrow(x)/10)
, type
is random, npermut
is 20
and method
is "inmemory"
, and the other components are NULL
Pierre-Andre Cornillon, Nicolas Hengartner and Eric Matzner-Lober.
Cornillon, P.-A.; Hengartner, N.; Jegou, N. and Matzner-Lober, E. (2012) Iterative bias reduction: a comparative study. Statistics and Computing, 23, 777-791.
Cornillon, P.-A.; Hengartner, N. and Matzner-Lober, E. (2013) Recursive bias estimation for multivariate regression smoothers Recursive bias estimation for multivariate regression smoothers. ESAIM: Probability and Statistics, 18, 483-502.
Cornillon, P.-A.; Hengartner, N. and Matzner-Lober, E. (2017) Iterative Bias Reduction Multivariate Smoothing in R: The ibr Package. Journal of Statistical Software, 77, 1--26.
Wood, S.N. (2003) Thin plate regression splines. J. R. Statist. Soc. B, 65, 95-114.
ibr
, predict.ibr
, summary.ibr
, gam