Learn R Programming

RecordLinkage (version 0.2-0)

gpdEst: Estimate Threshold from Pareto Distribution

Description

Fits weights to a Pareto distribution and calculates a quantil on the fitted model as classification threshold.

Usage

gpdEst(Wdata, thresh = -Inf, quantil = 0.95)

Arguments

Wdata
A numeric vector representing weights of record pairs.
thresh
Threshold for exceedances.
quantil
A real number between 0 and 1. The desired quantil.

Value

  • A real number representing the resulting classification threshold. It is assured that the threshold lies in a reasonable range.

Details

The weights that exceed thresh are fitted to a generalized Pareto distribution (GPD). The estimated parameters shape and shape are used to calculate a classification threshold by the formula $$\mathit{thresh}+\frac{\mathit{scale}}{\mathit{shape}} ((\frac{n}{k}(1-\mathit{quantil}))^{-\mathit{shape}} -1)$$ where $n$ is the total number of weights and $k$ the number of exceedances.

See Also

getParetoThreshold for user-level function