Estimate density function from a vector of right-censored survival times using kernel functions.
Options include three types of bandwidth functions, three types of boundary correction, and four shapes
for the kernel function. Uses the global and local bandwidth selection algorithms and the boundary kernel
formulations described in Mueller and Wang (1994). The nearest neighbor bandwidth formulation is based
on that described in Gefeller and Dette (1992). The statistical properties of many of these estimators
are reported and compared in Hess et al. The mudens(.) function is an R wrapper around C code
and returns an object of class 'mudens' based on the density estimation in the HADES program developed
by H.G. Mueller.
mudens(times, delta, subset, min.time, max.time, bw.grid, bw.pilot,
bw.smooth, bw.method="local", b.cor="both", n.min.grid=51,
n.est.grid = 101, kern="epanechnikov")A vector of survival times. It does not need to be sorted.
A vector indicating censoring: 0 - censored (alive), 1 - uncensored (dead).
If delta is missing, all the observations are assumed uncensored.
A logical vector indicating the observations used in analysis.
TRUE - observation is used, FALSE - observation is not used.
If missing, all the observations will be used.
Left bound of the time domain used in analysis. If missing, min.time is set to 0.
Right bound of the time domain used in analysis.
If missing, max.time is the maximum value of times.
Bandwidth grid used in the MSE minimization.
If bw.method="global" and bw.grid has one component only, no MSE minimization is performed.
The hazard estimates are computed for the value of bw.grid.
If bw.grid is missing, then a bandwidth grid of 21 components is built, having as bounds:
[0.2*bw.pilot, 20*bw.pilot]
Pilot bandwidth used in the MSE minimization. If missing, the default value is the one recommended by Mueller and Wang (1994):
bw.pilot = (max.time-min.time)/(8*nz^0.2),
where nz is the number of uncensored observations.
Bandwidth used in smoothing the local bandwidths. Not used if
bw.method="global".
If missing: bw.smooth=5*bw.pilot.
Algorithm to be used. Possible values are: "global" - same bandwidth for all grid points.
In this case, the optimal bandwidth is obtained by minimizing the IMSE.
"local" - different bandwidths at each grid point, and the optimal bandwidth at a grid point
is obtained by minimizing the local MSE. "knn" - k nearest neighbors distance bandwidth,
and the optimal number of neighbors is obtained by minimizing the IMSE.
Note: The default value is "local". Only the first letter needs to be given (e.g. "g", instead of "global").
Boundary correction type. Possible values are:
"none" - no boundary correction, "left" - left only correction,
"both" - left and right corrections.
The default value is set to "both". Only the first letter needs to be given (e.g. b.cor="n").
Number of points in the minimization grid. This value greatly influences the computing time. Default value is 51.
Number of points in the estimation grid, where hazard estimates are computed. Default value is 101.
Boundary kernel function to be used. Possible values are:
"rectangle", "epanechnikov", "biquadratic", "triquadratic".
The default value is "epanechnikov". Only the first letter needs to be given (e.g. kern="b").
Returns an object of class 'mudens', containing input and output values.
Methods working on such an object are:
plot, lines, summary. For a detailed description of its components,
see object.mudens in the mudens package.
Estimate density function from a vector of right-censored survival times.
The mudens object contains a list of the input data and parameter values as well as a variety of output data.
The density function estimate is contained in the haz.est element and the corresponding time points are in est.grid.
The unsmoothed and smoothed local bandwidths are in bw.loc and bw.loc.sm, respectively.
When setting bw.method='local' or 'knn', to check the shape of the bandwidth function used in the estimation,
use plot(fit$pin$min.grid, fit$bw.loc) to plot the unsmoothed bandwidths and
use lines(fit$est.grid, fit$bw.loc.sm) to superimpose the smoothed bandwidth function.
We can also use bw.smooth to change the amount of smoothing used on the bandwidth function.
For bw.method='global', use plot(fit$bw.grid, fit$globlmse) to check the minimization process, and
plot the estimated IMSE values over the bandwidth search grid; while for
bw.method='k', use plot(fit$k.grid, fit$k.imse).
You may want to repeat the search using a finer grid over a shorter interval to fine-tune the optimization or if the observed minimum is at the extreme of the grid you should specify a different grid.
Hess, K.R. and Zhong, M. Density Function Estimation for Possibly Right-Censored Data Using Kernel Functions. Submitted.
H.G. Mueller and J.L. Wang. Hazard Rates Estimation Under Random Censoring with Varying Kernels and Bandwidths. Biometrics 50:61-76, March, 1994.
O.Gefeller and H. Dette. Nearest Neighbor Kernel Estimation of the Hazard Function From Censored Data. J. Statist. Comput. Simul., Vol.43:93-101, 1992.
# NOT RUN {
time <- rexp(1000)
stat <- sample(c(0,1), 1000, 0.5)
fit <- mudens(time, stat)
summary(fit)
# }
Run the code above in your browser using DataLab