fitdistdoublecens: Fit a distribution to doubly censored data

Description

This function wraps the custom approach for fitting distributions to doubly censored data using fitdistrplus and primarycensored. It handles primary censoring (when the primary event time is not known exactly), secondary censoring (when the secondary event time is interval-censored), and truncation (when events are only observed within a delay range [L, D]).

Usage

fitdistdoublecens(
  censdata,
  distr,
  left = "left",
  right = "right",
  pwindow = "pwindow",
  L = "L",
  D = "D",
  dprimary = stats::dunif,
  dprimary_args = list(),
  truncation_check_multiplier = 2,
  ...
)

Value

An object of class "fitdist" as returned by fitdistrplus::fitdist.

Arguments

censdata: A data frame with columns 'left' and 'right' representing the lower and upper bounds of the censored observations. Unlike fitdistrplus::fitdistcens() NA is not supported for either the upper or lower bounds.
distr: A character string naming the distribution to be fitted. This should be the base name of a distribution with corresponding d (density) and p (cumulative distribution) functions available. For example, use "gamma" (which will use dgamma and pgamma), "lnorm" (for dlnorm and plnorm), "weibull", "norm", etc. Custom distributions can also be used as long as the corresponding d<distr>() and p<distr>() functions are defined and loaded.
left: Column name for lower bound of observed values (default: "left").
right: Column name for upper bound of observed values (default: "right").
pwindow: Column name for primary window (default: "pwindow").
L: Column name for minimum delay (lower truncation point). If greater than 0, the distribution is left-truncated at L. This is useful for modelling generation intervals where day 0 is excluded, particularly when used in renewal models. (default: "L"). If the column is not present in censdata, L = 0 is assumed.
D: Column name for maximum delay (upper truncation point). If finite, the distribution is truncated at D. If set to Inf, no upper truncation is applied. (default: "D").
dprimary: Function to generate the probability density function (PDF) of primary event times. This function should take a value x and a pwindow parameter, and return a probability density. It should be normalized to integrate to 1 over [0, pwindow]. Defaults to a uniform distribution over [0, pwindow]. Users can provide custom functions or use helper functions like dexpgrowth for an exponential growth distribution. See pcd_primary_distributions() for examples. The package can identify base R distributions for potential analytical solutions. For non-base R functions, users can apply add_name_attribute() to yield properly tagged functions if they wish to leverage analytical solutions.
dprimary_args: List of additional arguments to be passed to dprimary. For example, when using dexpgrowth, you would pass list(min = 0, max = pwindow, r = 0.2) to set the minimum, maximum, and rate parameters
truncation_check_multiplier: Numeric multiplier to use for checking if the truncation time D is appropriate relative to the maximum delay. Set to NULL to skip the check. Default is 2.
...: Additional arguments to be passed to fitdistrplus::fitdist().

Details

How distribution functions are resolved

The distr parameter specifies the base name of the distribution. The function automatically looks up the corresponding density (d) and cumulative distribution (p) functions by prepending these prefixes to the distribution name. For example:

distr = "gamma" uses dgamma() and pgamma()
distr = "lnorm" uses dlnorm() and plnorm()
distr = "weibull" uses dweibull() and pweibull()

Any distribution available in base R or loaded packages can be used, as long as the corresponding d<distr> and p<distr> functions exist and follow standard R distribution function conventions (first argument is x for density, q for CDF).

What this function does internally

This function creates custom density and CDF functions that account for primary censoring, secondary censoring, and truncation using dprimarycensored() and pprimarycensored(). These custom functions are then passed to fitdistrplus::fitdist() for maximum likelihood estimation.

The function handles varying observation windows across observations, making it suitable for real-world data where truncation times or censoring windows may differ between observations.

Examples

Run this code

if (FALSE) { # requireNamespace("fitdistrplus", quietly = TRUE)
# Example with normal distribution
set.seed(123)
n <- 1000
true_mean <- 5
true_sd <- 2
pwindow <- 2
swindow <- 2
D <- 10
samples <- rprimarycensored(
  n, rnorm,
  mean = true_mean, sd = true_sd,
  pwindow = pwindow, swindow = swindow, D = D
)

delay_data <- data.frame(
  left = samples,
  right = samples + swindow,
  pwindow = rep(pwindow, n),
  D = rep(D, n)
)

fit_norm <- fitdistdoublecens(
  delay_data,
  distr = "norm",
  start = list(mean = 0, sd = 1)
)

summary(fit_norm)
}

Run the code above in your browser using DataLab