fkdengpdcon(x, phiu = TRUE, pvector = NULL,
add.jitter = FALSE, factor = 0.1, amount = NULL,
std.err = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)
nmean
, nsd
, u
,
sigmau
, xi
) or NULL
jitter
jitter
optim
)optim
)optim
call
: optim
call
x
: (jittered) data vector x
kerncentres
: actual kernel centres used
x
init
: pvector
optim
: complete optim
output
mle
: vector of MLE of parameters
cov
: variance of MLE parameters
se
:
standard error of MLE parameters
nllh
:
minimum negative cross-validation log-likelihood
allparams
: vector of MLE of model parameters,
including phiu
and sigmau
allse
:
vector of standard error of all parameters,
including phiu
and sigmau
n
: total sample size
lambda
: MLE of
bandwidth
u
: threshold
sigmau
:
MLE of GPD scale
xi
: MLE of GPD
shape
phiu
: MLE of tail fraction
}
The output list has some duplicate entries and repeats
some of the inputs to both provide similar items to those
from fpot
and to make it as
useable as possible.x
only) has been included in the fitting inputs, using the
jitter
function, to remove the
ties. The default options red in the
jitter
are specified above,
but the user can override these. Notice the default
scaling factor=0.1
, which is a tenth of the
default value in the jitter
function itself.
A warning message is given if the data appear to be
rounded (i.e. more than 5 estimated bandwidth is too small, then data rounding is
the likely culprit. Only use the jittering when the MLE
of the bandwidth is far too small.
2) For heavy tailed populations the bandwidth is
positively biased, giving oversmoothing (see example).
The bias is due to the distance between the upper (or
lower) order statistics not necessarily decaying to zero
as the sample size tends to infinity. Essentially, as the
distance between the two largest (or smallest) sample
datapoints does not decay to zero, some smoothing between
them is required (i.e. bandwidth cannot be zero). One
solution to this problem is to splice the GPD at a
suitable threshold to remove the problematic tail from
the inference for the bandwidth, using either the
kdengpdgpd
function for a single heavy tail or the
kdengpdgng
function if both tails are heavy. See
MacDonald et al (2013).phiu=TRUE
so that the tail
fraction is specified by normal distribution $\phi_u
= 1 - H(u)$. When phiu=FALSE
then the tail
fraction is treated as an extra parameter estimated using
the MLE which is the sample proportion above the
threshold. In this case the standard error for
phiu
is estimated and output as sephiu
.
Missing values (NA
and NaN
) are assumed to
be invalid data so are ignored, which is inconsistent
with the evd
library which
assumes the missing values are below the threshold.
The default optimisation algorithm is "BFGS", which
requires a finite negative log-likelihood function
evaluation finitelik=TRUE
. For invalid parameters,
a zero likelihood is replaced with exp(-1e6)
. The
"BFGS" optimisation algorithms require finite values for
likelihood, so any user input for finitelik
will
be overridden and set to finitelik=TRUE
if either
of these optimisation methods is chosen.
It will display a warning for non-zero convergence result
comes from optim
function
call.
If the hessian is of reduced rank then the variance (from
inverse hessian) and standard error of bandwidth
parameter cannot be calculated, then by default
std.err=TRUE
and the function will stop. If you
want the bandwidth estimate even if the hessian is of
reduced rank (e.g. in a simulation study) then set
std.err=FALSE
.fkdengpd
,
fkden
,
jitter
,
density
and
bw.nrd0
Other kdengpdcon: dkdengpdcon
,
kdengpdcon
, lkdengpdcon
,
nlkdengpdcon
, pkdengpdcon
,
qkdengpdcon
, rkdengpdcon
x = rnorm(1000, 0, 1)
fit = fkdengpdcon(x, phiu = FALSE, std.err = FALSE)
hist(x, 100, freq = FALSE, xlim = c(-5, 5))
xx = seq(-5, 5, 0.01)
lines(xx, dkdengpdcon(xx, x, fit$lambda, fit$u, fit$xi, fit$phiu), col="blue")
abline(v = fit$u)
Run the code above in your browser using DataLab