smicd (version 1.1.3)

kdeAlgo: Estimation of Statistical Indicators from Interval-Censored Data

Description

The function applies an iterative kernel density algorithm for the estimation of a variety of statistical indicators (e.g. mean, median, quantiles, gini) from interval-censored data. The estimation of the standard errors is facilitated by a non-parametric bootstrap.

Usage

kdeAlgo(
  xclass,
  classes,
  threshold = 0.6,
  burnin = 80,
  samples = 400,
  bootstrap.se = FALSE,
  b = 100,
  bw = "nrd0",
  evalpoints = 4000,
  adjust = 1,
  custom_indicator = NULL,
  upper = 3,
  weights = NULL,
  oecd = NULL
)

Value

An object of class "kdeAlgo" that provides estimates for statistical indicators and optionally, corresponding standard error estimates. Generic functions such as, print, and plot have methods that can be used to obtain further information. See kdeAlgoObject for a description of components of objects of class "kdeAlgo".

Arguments

xclass

interval-censored values; factor with ordered factor values, as in dclass

classes

numeric vector of classes; Inf as last value is allowed, as in dclass

threshold

used for the Head-Count Ratio and Poverty Gap, default is 60% of the median e.g. threshold=0.6

burnin

burn-in sample size, as in dclass

samples

sampling iteration size, as in dclass

bootstrap.se

if TRUE standard errors for the statistical indicators are estimated

b

number of bootstrap iterations for the estimation of the standard errors

bw

bandwidth selector method, defaults to "nrd0", as in density

evalpoints

number of evaluation grid points, as in dclass

adjust

the user can multiply the bandwidth by a certain factor such that bw=adjust*bw as in density

custom_indicator

a list of functions containing the indicators to be additionally calculated. Such functions must only depend on the target variable y and the threshold. For the estimation of weighted custom indicators the function must also depend on weights. Defaults to NULL.

upper

if the upper bound of the upper interval is Inf e.g. (15000,Inf), then Inf is replaced by 15000*upper

weights

any kind of survey or design weights that will be used for the weighted estimation of the statistical indicators

oecd

weights for equivalized household size

Details

The statistical indicators are estimated using pseudo samples as proxy for the interval-censored variable. The object resultX returns the pseudo samples for each iteration step of the KDE-algorithm.

References

Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored Data with Applications to the German Microcensus, PhD thesis, Freie Universitaet Berlin

Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017). Estimating the density of ethnic minorities and aged people in Berlin: Multivariate Kernel Density Estimation applied to sensitive georeferenced administrative data protected via measurement error. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180.

See Also

dclass, print.kdeAlgo, plot.kdeAlgo

Examples

Run this code
if (FALSE) {
# Generate data
x <- rlnorm(500, meanlog = 8, sdlog = 1)
classes <- c(0, 500, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 6000, 8000, 10000, 15000, Inf)
xclass <- cut(x, breaks = classes)
weights <- abs(rnorm(500, 0, 1))
oecd <- rep(seq(1, 6.9, 0.3), 25)

# Estimate statistical indicators with default settings
Indicator <- kdeAlgo(xclass = xclass, classes = classes)

# Include custom indicators
Indicator_custom <- kdeAlgo(
  xclass = xclass, classes = classes,
  custom_indicator = list(quant5 = function(y, threshold) {
    quantile(y, probs = 0.05)
  })
)

# Indclude survey and oecd weights
Indicator_weights <- kdeAlgo(
  xclass = xclass, classes = classes,
  weights = weights, oecd = oecd
)
}
# \dontshow{
# Generate data
x <- rlnorm(500, meanlog = 8, sdlog = 1)
classes <- c(0, 500, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 6000, 8000, 10000, 15000, Inf)
xclass <- cut(x, breaks = classes)

# Estimate statistical indicators
Indicator <- kdeAlgo(xclass = xclass, classes = classes, burnin = 10, samples = 40)
# }

Run the code above in your browser using DataLab