mudens: Estimate density function from right-censored survival data

Description

Estimate density function from a vector of right-censored survival times using kernel functions. Options include three types of bandwidth functions, three types of boundary correction, and four shapes for the kernel function. Uses the global and local bandwidth selection algorithms and the boundary kernel formulations described in Mueller and Wang (1994). The nearest neighbor bandwidth formulation is based on that described in Gefeller and Dette (1992). The statistical properties of many of these estimators are reported and compared in Hess et al. The mudens(.) function is an R wrapper around C code and returns an object of class 'mudens' based on the density estimation in the HADES program developed by H.G. Mueller.

Usage

mudens(times, delta, subset, min.time, max.time, bw.grid, bw.pilot,
bw.smooth, bw.method="local", b.cor="both", n.min.grid=51,
n.est.grid = 101, kern="epanechnikov")

Arguments

times

A vector of survival times. It does not need to be sorted.

delta

A vector indicating censoring: 0 - censored (alive), 1 - uncensored (dead). If delta is missing, all the observations are assumed uncensored.

subset

A logical vector indicating the observations used in analysis. TRUE - observation is used, FALSE - observation is not used. If missing, all the observations will be used.

min.time

Left bound of the time domain used in analysis. If missing, min.time is set to 0.

max.time

Right bound of the time domain used in analysis. If missing, max.time is the maximum value of times.

bw.grid

Bandwidth grid used in the MSE minimization. If bw.method="global" and bw.grid has one component only, no MSE minimization is performed. The hazard estimates are computed for the value of bw.grid. If bw.grid is missing, then a bandwidth grid of 21 components is built, having as bounds: [0.2*bw.pilot, 20*bw.pilot]

bw.pilot

Pilot bandwidth used in the MSE minimization. If missing, the default value is the one recommended by Mueller and Wang (1994):

bw.pilot = (max.time-min.time)/(8*nz^0.2),

where nz is the number of uncensored observations.

bw.smooth

Bandwidth used in smoothing the local bandwidths. Not used if

bw.method="global". If missing: bw.smooth=5*bw.pilot.

bw.method

Algorithm to be used. Possible values are: "global" - same bandwidth for all grid points. In this case, the optimal bandwidth is obtained by minimizing the IMSE. "local" - different bandwidths at each grid point, and the optimal bandwidth at a grid point is obtained by minimizing the local MSE. "knn" - k nearest neighbors distance bandwidth, and the optimal number of neighbors is obtained by minimizing the IMSE. Note: The default value is "local". Only the first letter needs to be given (e.g. "g", instead of "global").

b.cor

Boundary correction type. Possible values are: "none" - no boundary correction, "left" - left only correction, "both" - left and right corrections. The default value is set to "both". Only the first letter needs to be given (e.g. b.cor="n").

n.min.grid

Number of points in the minimization grid. This value greatly influences the computing time. Default value is 51.

n.est.grid

Number of points in the estimation grid, where hazard estimates are computed. Default value is 101.

kern

Boundary kernel function to be used. Possible values are: "rectangle", "epanechnikov", "biquadratic", "triquadratic". The default value is "epanechnikov". Only the first letter needs to be given (e.g. kern="b").

Value

Returns an object of class 'mudens', containing input and output values. Methods working on such an object are: plot, lines, summary. For a detailed description of its components, see object.mudens in the mudens package.

Details

Estimate density function from a vector of right-censored survival times.

The mudens object contains a list of the input data and parameter values as well as a variety of output data. The density function estimate is contained in the haz.est element and the corresponding time points are in est.grid. The unsmoothed and smoothed local bandwidths are in bw.loc and bw.loc.sm, respectively.

When setting bw.method='local' or 'knn', to check the shape of the bandwidth function used in the estimation, use plot(fit$pin$min.grid, fit$bw.loc) to plot the unsmoothed bandwidths and use lines(fit$est.grid, fit$bw.loc.sm) to superimpose the smoothed bandwidth function. We can also use bw.smooth to change the amount of smoothing used on the bandwidth function.

For bw.method='global', use plot(fit$bw.grid, fit$globlmse) to check the minimization process, and plot the estimated IMSE values over the bandwidth search grid; while for bw.method='k', use plot(fit$k.grid, fit$k.imse).

You may want to repeat the search using a finer grid over a shorter interval to fine-tune the optimization or if the observed minimum is at the extreme of the grid you should specify a different grid.

References

Hess, K.R. and Zhong, M. Density Function Estimation for Possibly Right-Censored Data Using Kernel Functions. Submitted.

H.G. Mueller and J.L. Wang. Hazard Rates Estimation Under Random Censoring with Varying Kernels and Bandwidths. Biometrics 50:61-76, March, 1994.

O.Gefeller and H. Dette. Nearest Neighbor Kernel Estimation of the Hazard Function From Censored Data. J. Statist. Comput. Simul., Vol.43:93-101, 1992.

Examples

Run this code

# NOT RUN {
time <- rexp(1000)
stat <- sample(c(0,1), 1000, 0.5)
fit <- mudens(time, stat)
summary(fit)


# }

Run the code above in your browser using DataLab