fkdengpd(x, phiu = TRUE, pvector = NULL,
add.jitter = FALSE, factor = 0.1, amount = NULL,
std.err = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)nmean, nsd, u,
sigmau, xi) or NULLoptim)optim)optimjitterjittercall: optim call
x: (jittered) data vector x
kerncentres: actual kernel centres used
x
init: pvector
optim: complete optim output
mle: vector of MLE of parameters
cov: variance of MLE parameters
se:
standard error of MLE parameters
nllh:
minimum negative cross-validation log-likelihood
allparams: vector of MLE of model parameters,
including phiu
allse: vector of
standard error of all parameters, including
phiu
n: total sample size
lambda: MLE of bandwidth
u: threshold
sigmau: MLE of GPD scale
xi: MLE of GPD shape
phiu: MLE
of tail fraction
}
The output list has some duplicate entries and repeats
some of the inputs to both provide similar items to those
from fpot and to make it as
useable as possible.x
only) has been included in the fitting inputs, using the
jitter function, to remove the
ties. The default options red in the
jitter are specified above,
but the user can override these. Notice the default
scaling factor=0.1, which is a tenth of the
default value in the jitter
function itself.
A warning message is given if the data appear to be
rounded (i.e. more than 5 estimated bandwidth is too small, then data rounding is
the likely culprit. Only use the jittering when the MLE
of the bandwidth is far too small.
2) For heavy tailed populations the bandwidth is
positively biased, giving oversmoothing (see example).
The bias is due to the distance between the upper (or
lower) order statistics not necessarily decaying to zero
as the sample size tends to infinity. Essentially, as the
distance between the two largest (or smallest) sample
datapoints does not decay to zero, some smoothing between
them is required (i.e. bandwidth cannot be zero). One
solution to this problem is to splice the GPD at a
suitable threshold to remove the problematic tail from
the inference for the bandwidth, using either the
kdengpdgpd function for a single heavy tail or the
kdengpdgng function if both tails are heavy. See
MacDonald et al (2013).phiu=TRUE so that
the tail fraction is specified by normal distribution
$\phi_u = 1 - H(u)$. When phiu=FALSE then the
tail fraction is treated as an extra parameter estimated
using the MLE which is the sample proportion above the
threshold. In this case the standard error for
phiu is estimated and output as sephiu.
Missing values (NA and NaN) are assumed to
be invalid data so are ignored, which is inconsistent
with the evd library which
assumes the missing values are below the threshold.
The default optimisation algorithm is "BFGS", which
requires a finite negative log-likelihood function
evaluation finitelik=TRUE. For invalid parameters,
a zero likelihood is replaced with exp(-1e6). The
"BFGS" optimisation algorithms require finite values for
likelihood, so any user input for finitelik will
be overridden and set to finitelik=TRUE if either
of these optimisation methods is chosen.
It will display a warning for non-zero convergence result
comes from optim function
call.
If the hessian is of reduced rank then the variance (from
inverse hessian) and standard error of bandwidth
parameter cannot be calculated, then by default
std.err=TRUE and the function will stop. If you
want the bandwidth estimate even if the hessian is of
reduced rank (e.g. in a simulation study) then set
std.err=FALSE.fkden,
jitter,
density and
bw.nrd0
Other kdengpd: dkdengpd,
kdengpd, pkdengpd,
qkdengpd, rkdengpdx = rnorm(1000, 0, 1)
fit = fkdengpd(x, phiu = FALSE, std.err = FALSE)
hist(x, 100, freq = FALSE, xlim = c(-5, 5))
xx = seq(-5, 5, 0.01)
lines(xx, dkdengpd(xx, x, fit$lambda, fit$u, fit$sigmau, fit$xi, fit$phiu), col="blue")
abline(v = fit$u)Run the code above in your browser using DataLab