This function can be used to test the equality of the \(M\) critical points estimated from the respective level-specific curves.
localtest(
formula,
data = data,
na.action = "na.omit",
der,
smooth = "kernel",
weights = NULL,
nboot = 500,
h0 = -1,
h = -1,
nh = 30,
kernel = "epanech",
p = 3,
kbin = 100,
rankl = NULL,
ranku = NULL,
seed = NULL,
cluster = TRUE,
ncores = NULL,
ci.level = 0.95,
...
)
The estimate of \(d\) value is returned and its confidence interval for a specific-level of confidence, i.e. 95%. Additionally, it is shown the decision, accepted or rejected, of the local test. Based on the null hypothesis is rejected if a zero value is not within the interval.
An object of class formula
: a sympbolic
description of the model to be fitted. The details of model
specification are given under 'Details'.
An optional data frame, matrix or list required by
the formula. If not found in data, the variables are taken from
environment(formula)
, typically the environment from which
localtest
is called.
A function which indicates what should happen when the data contain 'NA's. The default is 'na.omit'.
Number which determines any inference process.
By default der
is NULL
. If this term is 0
,
the testing procedures is applied for the estimate. If it is 1
or
2
, it is designed for the first or second derivative, respectively.
Type smoother used: smooth = "kernel"
for local polynomial
kernel smoothers and smooth = "splines"
for splines using the
mgcv
package.
Prior weights on the data.
Number of bootstrap repeats.
The kernel bandwidth smoothing parameter for the global effect (see references for more details at the estimation). Large values of the bandwidth lead to smoothed estimates; smaller values of the bandwidth lead lo undersmoothed estimates. By default, cross validation is used to obtain the bandwidth.
The kernel bandwidth smoothing parameter for the partial effects.
Integer number of equally-spaced bandwidth on which the
h
is discretised, to speed up computation.
A character string specifying the desired kernel.
Defaults to kernel = "epanech"
, where the Epanechnikov
density function kernel will be used. Also, several types of kernel funcitons
can be used: triangular and Gaussian density function,
with "triang"
and "gaussian"
term, respectively.
Degree of polynomial to be used. Its value must be the value of derivative + 1. The default value is 3 due to the function returns the estimation, first and second derivative.
Number of binning nodes over which the function is to be estimated.
Number or vector specifying the minimum value for the
interval at which to search the x
value which maximizes the
estimate, first or second derivative (for each level). The default
is the minimum data value.
Number or vector specifying the maximum value for the
interval at which to search the x
value which maximizes the
estimate, first or second derivative (for each level). The default
is the maximum data value.
Seed to be used in the bootstrap procedure.
A logical value. If TRUE
(default), the
bootstrap procedure is parallelized (only for smooth = "splines"
.
Note that there are cases
(e.g., a low number of bootstrap repetitions) that R will gain in
performance through serial computation. R takes time to distribute tasks
across the processors also it will need time for binding them all together
later on. Therefore, if the time for distributing and gathering pieces
together is greater than the time need for single-thread computing, it does
not worth parallelize.
An integer value specifying the number of cores to be used
in the parallelized procedure. If NULL
(default), the number of cores
to be used is equal to the number of cores of the machine - 1.
Level of bootstrap confidence interval. Defaults to 0.95 (corresponding to 95%). Note that the function accepts a vector of levels.
Other options.
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
localtest
can be used to test the equality of the
\(M\) critical points estimated from the respective level-specific curves.
Note that, even if the curves and/or their derivatives are different, it is
possible for these points to be equal.
For instance, taking the maxima of the first derivatives into account, interest lies in testing the following null hypothesis
$$H_0: x_{01} = \ldots = x_{0M}$$
versus the general alternative
$$H_1: x_{0i} \ne x_{0j} \quad {\rm{for}} \quad {\rm{some}} \quad i, j \in \{ 1, \ldots, M\}.$$
The above hypothesis is true if \(d=x_{0j}-x_{0k}=0\) where $$ (j,k)= argmax \quad (l,m) \quad \{1 \leq l<m \leq M\} \quad |x_{0l}-x_{0m}|, $$
otherwise \(H_0\) is false. It is important to highlight that, in practice, the true \(x_{0j}\) are not known, and consequently neither is \(d\), so an estimate \(\hat d = \hat x_{0j}-\hat x_{0k}\) is used, where, in general, \(\hat x_{0l}\) are the estimates of \(x_{0l}\) based on the estimated curves \(\hat m_l\) with \(l = 1, \ldots , M\).
Needless to say, since \(\hat d\) is only an estimate of the true \(d\), the sampling uncertainty of these estimates needs to be acknowledged. Hence, a confidence interval \((a,b)\) is created for \(d\) for a specific level of confidence (95%). Based on this, the null hypothesis is rejected if zero is not contained in the interval.
Note that if this hypothesis is rejected (and the factor has more than
two levels), one option could be to use the maxp.diff
function in
order to obtain the differences between each pair of factor's levels.
Note that the models fitted by localtest
function are specified
in a compact symbolic form. The ~ operator is basic in the formation
of such models. An expression of the form y ~ model
is interpreted as
a specification that the response y
is modelled by a predictor
specified symbolically by model
. The possible terms consist of a
variable name or a variable name and a factor name separated by : operator.
Such a term is interpreted as the interaction of the continuous variable and
the factor. However, if smooth = "splines"
, the formula is based on the function
formula.gam of the mgcv package.
Sestelo, M. (2013). Development and computational implementation of estimation and inference methods in flexible regression models. Applications in Biology, Engineering and Environment. PhD Thesis, Department of Statistics and O.R. University of Vigo.
Sestelo, M., Villanueva, N.M., Meira-Machado, L., Roca-Pardinas, J. (2017). npregfast: An R Package for Nonparametric Estimation and Inference in Life Sciences. Journal of Statistical Software, 82(12), 1-27.
library(npregfast)
data(barnacle)
localtest(DW ~ RC : F, data = barnacle, der = 1, seed = 130853, nboot = 100)
# localtest(height ~ s(age, by = sex), data = children, seed = 130853,
# der = 1, smooth = "splines")
Run the code above in your browser using DataLab