elnormCensored: Estimate Parameters for a Lognormal Distribution (Log-Scale) Based on Type I Censored Data

Description

Estimate the mean and standard deviation parameters of the logarithm of a lognormal distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for the mean.

Usage

elnormCensored(x, censored, method = "mle", censoring.side = "left",  ci = FALSE, ci.method = "profile.likelihood", ci.type = "two-sided",  conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z",  nmc = 1000, seed = NULL, ...)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

censored

numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of x that are censored, and FALSE values correspond to elements of x that are not censored. If the mode of censored is "numeric", it must contain only 1's and 0's; 1 corresponds to TRUE and 0 corresponds to FALSE. Missing (NA) values are allowed but will be removed.

method

character string specifying the method of estimation.

For singly censored data, the possible values are: "mle" (maximum likelihood; the default), "bcmle" (bias-corrected maximum likelihood), "ROS" or "qq.reg" (quantile-quantile regression; also called regression on order statistics and abbreviated ROS), "qq.reg.w.cen.level" (quantile-quantile regression including the censoring level), "rROS" or "impute.w.qq.reg" (moment estimation based on imputation using quantile-quantile regression; also called robust regression on order statistics and abbreviated rROS), "impute.w.qq.reg.w.cen.level" (moment estimation based on imputation using the qq.reg.w.cen.level method), "impute.w.mle" (moment estimation based on imputation using the mle), "iterative.impute.w.qq.reg" (moment estimation based on iterative imputation using the qq.reg method), "m.est" (robust M-estimation), and "half.cen.level" (moment estimation based on setting the censored observations to half the censoring level).

For multiply censored data, the possible values are: "mle" (maximum likelihood; the default), "ROS" or "qq.reg" (quantile-quantile regression; also called regression on order statistics and abbreviated ROS), "rROS" or "impute.w.qq.reg" (moment estimation based on imputation using quantile-quantile regression; also called robust regression on order statistics and abbreviated rROS), and "half.cen.level" (moment estimation based on setting the censored observations to half the censoring level).

See the DETAILS section for more information.

censoring.side

character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".

logical scalar indicating whether to compute a confidence interval for the mean or variance. The default value is ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the mean. The possible values are: "profile.likelihood" (profile likelihood; the default), "normal.approx" (normal approximation), "normal.approx.w.cov" (normal approximation taking into account the covariance between the estimated mean and standard deviation; only available for singly censored data), "gpq" (generalized pivotal quantity), and "bootstrap" (based on bootstrapping).

See the DETAILS section for more information. This argument is ignored if ci=FALSE.

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

n.bootstraps

numeric scalar indicating how many bootstraps to use to construct the confidence interval for the mean when ci.type="bootstrap". This argument is ignored if ci=FALSE and/or ci.method does not equal "bootstrap".

pivot.statistic

character string indicating which pivot statistic to use in the construction of the confidence interval for the mean when ci.method="normal.approx" or ci.method="normal.approx.w.cov" (see the DETAILS section). The possible values are pivot.statistic="z" (the default) and pivot.statistic="t". When pivot.statistic="t" you may supply the argument ci.sample size (see below). The argument pivot.statistic is ignored if ci=FALSE.

nmc

numeric scalar indicating the number of Monte Carlo simulations to run when ci.method="gpq". The default is nmc=1000. This argument is ignored if ci=FALSE.

seed

integer supplied to the function set.seed and used when ci.method="bootstrap" or ci.method="gpq". The default value is seed=NULL, in which case the current value of .Random.seed is used. This argument is ignored when ci=FALSE.

...

additional arguments to pass to other functions.

prob.method. Character string indicating what method to use to compute the plotting positions (empirical probabilities) when method is one of "ROS", "qq.reg", "qq.reg.w.cen.level", "rROS", "impute.w.qq.reg", "impute.w.qq.reg.w.cen.level", "impute.w.mle", or "iterative.impute.w.qq.reg". Possible values are: "kaplan-meier" (product-limit method of Kaplan and Meier (1958)), "nelson" (hazard plotting method of Nelson (1972)), "michael-schucany" (generalization of the product-limit method due to Michael and Schucany (1986)), and "hirsch-stedinger" (generalization of the product-limit method due to Hirsch and Stedinger (1987)). The default value is prob.method="michael-schucany". The "nelson" method is only available for censoring.side="right". See the DETAILS section and the help file for ppointsCensored for more information.
plot.pos.con. Numeric scalar between 0 and 1 containing the value of the plotting position constant to use when method is one of "qq.reg", "qq.reg.w.cen.level", "impute.w.qq.reg", "impute.w.qq.reg.w.cen.level", "impute.w.mle", or "iterative.impute.w.qq.reg". The default value is plot.pos.con=0.375. See the DETAILS section and the help file for ppointsCensored for more information.
ci.sample.size. Numeric scalar indicating what sample size to assume to construct the confidence interval for the mean if pivot.statistic="t" and ci.method="normal.approx" or ci.method="normal.approx.w.cov". When method equals "mle" or "bcmle", the default value is the expected number of uncensored observations, otherwise it is the observed number of uncensored observations.
lb.impute. Numeric scalar indicating the lower bound for imputed observations when method is one of "impute.w.qq.reg", "impute.w.qq.reg.w.cen.level", "impute.w.mle", or "iterative.impute.w.qq.reg". Imputed values smaller than this value will be set to this value. The default is lb.impute=-Inf.
ub.impute. Numeric scalar indicating the upper bound for imputed observations when method is one of "impute.w.qq.reg", "impute.w.qq.reg.w.cen.level", "impute.w.mle", or "iterative.impute.w.qq.reg". Imputed values larger than this value will be set to this value. The default is ub.impute=Inf.
convergence. Character string indicating the kind of convergence criterion when method="iterative.impute.w.qq.reg". The possible values are "relative" (the default) and "absolute". See the DETAILS section for more information.
tol. Numeric scalar indicating the convergence tolerance when method="iterative.impute.w.qq.reg". The default value is tol=1e-6. If convergence="relative", then the relative difference in the old and new estimates of the mean and the relative difference in the old and new estimates of the standard deviation must be less than tol for convergence to be achieved. If convergence="absolute", then the absolute difference in the old and new estimates of the mean and the absolute difference in the old and new estimates of the standard deviation must be less than tol for convergence to be achieved.
max.iter. Numeric scalar indicating the maximum number of iterations when method="iterative.impute.w.qq.reg".
t.df. Numeric scalar greater than or equal to 1 that determines the robustness and efficiency properties of the estimator when method="m.est". The default value is t.df=3.

Value

a list of class "estimateCensored" containing the estimated parameters and other information. See estimateCensored.object for details.

Details

If x or censored contain any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let $X$ denote a random variable with a lognormal distribution with parameters meanlog=$\mu$ and sdlog=$\sigma$. Then $Y = log(X)$ has a normal (Gaussian) distribution with parameters mean=$\mu$ and sd=$\sigma$. Thus, the function elnormCensored simply calls the function enormCensored using the log-transformed values of x.

References

Bain, L.J., and M. Engelhardt. (1991). Statistical Analysis of Reliability and Life-Testing Models. Marcel Dekker, New York, 496pp.

Cohen, A.C. (1959). Simplified Estimators for the Normal Distribution When Samples are Singly Censored or Truncated. Technometrics 1(3), 217--237.

Cohen, A.C. (1963). Progressively Censored Samples in Life Testing. Technometrics 5, 327--339

Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, 312pp.

Cox, D.R. (1970). Analysis of Binary Data. Chapman & Hall, London. 142pp.

Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1--26.

Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, 436pp.

El-Shaarawi, A.H. (1989). Inferences About the Mean from Censored Water Quality Data. Water Resources Research 25(4) 685--690.

El-Shaarawi, A.H., and D.M. Dolan. (1989). Maximum Likelihood Estimation of Water Quality Concentrations from Censored Data. Canadian Journal of Fisheries and Aquatic Sciences 46, 1033--1039.

El-Shaarawi, A.H., and S.R. Esterby. (1992). Replacement of Censored Observations by a Constant: An Evaluation. Water Research 26(6), 835--844.

El-Shaarawi, A.H., and A. Naderi. (1991). Statistical Inference from Multiply Censored Environmental Data. Environmental Monitoring and Assessment 17, 339--347.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135--146.

Gleit, A. (1985). Estimation for Small Normal Data Sets with Detection Limits. Environmental Science and Technology 19, 1201--1206.

Haas, C.N., and P.A. Scheff. (1990). Estimation of Averages in Truncated Samples. Environmental Science and Technology 24(6), 912--919.

Hashimoto, L.K., and R.R. Trussell. (1983). Evaluating Water Quality Data Near the Detection Limit. Paper presented at the Advanced Technology Conference, American Water Works Association, Las Vegas, Nevada, June 5-9, 1983.

Helsel, D.R. (1990). Less than Obvious: Statistical Treatment of Data Below the Detection Limit. Environmental Science and Technology 24(12), 1766--1774.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.

Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997--2004.

Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715--727.

Korn, L.R., and D.E. Tyler. (2001). Robust Estimation for Chemical Concentration Data Subject to Detection Limits. In Fernholz, L., S. Morgenthaler, and W. Stahel, eds. Statistics in Genetics and in the Environmental Sciences. Birkhauser Verlag, Basel, pp.41--63.

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461--496.

Millard, S.P., P. Dixon, and N.K. Neerchal. (2014; in preparation). Environmental Statistics with R. CRC Press, Boca Raton, Florida.

Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York, 634pp.

Newman, M.C., P.M. Dixon, B.B. Looney, and J.E. Pinder. (1989). Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations. Water Resources Bulletin 25(4), 905--916.

Pettitt, A. N. (1983). Re-Weighted Least Squares Estimation with Censored and Grouped Data: An Application of the EM Algorithm. Journal of the Royal Statistical Society, Series B 47, 253--260.

Regal, R. (1982). Applying Order Statistic Censored Normal Confidence Intervals to Time Censored Data. Unpublished manuscript, University of Minnesota, Duluth, Department of Mathematical Sciences.

Royston, P. (2007). Profile Likelihood for Estimation and Confdence Intervals. The Stata Journal 7(3), pp. 376--387.

Saw, J.G. (1961b). The Bias of the Maximum Likelihood Estimators of Location and Scale Parameters Given a Type II Censored Normal Sample. Biometrika 48, 448--451.

Schmee, J., D.Gladstein, and W. Nelson. (1985). Confidence Limits for Parameters of a Normal Distribution from Singly Censored Samples, Using Maximum Likelihood. Technometrics 27(2) 119--128.

Schneider, H. (1986). Truncated and Censored Samples from Normal Populations. Marcel Dekker, New York, New York, 273pp.

Shumway, R.H., A.S. Azari, and P. Johnson. (1989). Estimating Mean Concentrations Under Transformations for Environmental Data With Detection Limits. Technometrics 31(3), 347--356.

Singh, A., R. Maichle, and S. Lee. (2006). On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based Upon Data Sets with Below Detection Limit Observations. EPA/600/R-06/022, March 2006. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Stryhn, H., and J. Christensen. (2003). Confidence Intervals by the Profile Likelihood Method, with Applications in Veterinary Epidemiology. Contributed paper at ISVEE X (November 2003, Chile). http://people.upei.ca/hstryhn/stryhn208.pdf.

Travis, C.C., and M.L. Land. (1990). Estimating the Mean of Data Sets with Nondetectable Values. Environmental Science and Technology 24, 961--962.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Venzon, D.J., and S.H. Moolgavkar. (1988). A Method for Computing Profile-Likelihood-Based Confidence Intervals. Journal of the Royal Statistical Society, Series C (Applied Statistics) 37(1), pp. 87--94.

Examples

Run this code

  # Chapter 15 of USEPA (2009) gives several examples of estimating the mean
  # and standard deviation of a lognormal distribution on the log-scale using 
  # manganese concentrations (ppb) in groundwater at five background wells. 
  # In EnvStats these data are stored in the data frame 
  # EPA.09.Ex.15.1.manganese.df.

  # Here we will estimate the mean and standard deviation using the MLE, 
  # Q-Q regression (also called parametric regression on order statistics 
  # or ROS; e.g., USEPA, 2009 and Helsel, 2012), and imputation with Q-Q 
  # regression (also called robust ROS or rROS). 

  # First look at the data:
  #-----------------------

  EPA.09.Ex.15.1.manganese.df

  #   Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
  #1       1 Well.1                 <5           5.0     TRUE
  #2       2 Well.1               12.1          12.1    FALSE
  #3       3 Well.1               16.9          16.9    FALSE
  #...
  #23      3 Well.5                3.3           3.3    FALSE
  #24      4 Well.5                8.4           8.4    FALSE
  #25      5 Well.5                 <2           2.0     TRUE

  longToWide(EPA.09.Ex.15.1.manganese.df, 
    "Manganese.Orig.ppb", "Sample", "Well",
    paste.row.name = TRUE)  

  #         Well.1 Well.2 Well.3 Well.4 Well.5
  #Sample.1     <5     <5     <5    6.3   17.9
  #Sample.2   12.1    7.7    5.3   11.9   22.7
  #Sample.3   16.9   53.6   12.6     10    3.3
  #Sample.4   21.6    9.5  106.3     <2    8.4
  #Sample.5     <2   45.9   34.5   77.2     <2


  # Now estimate the mean and standard deviation on the log-scale 
  # using the MLE:
  #---------------------------------------------------------------

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored))

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              2 5 
  #
  #Estimated Parameter(s):          meanlog = 2.215905
  #                                 sdlog   = 1.356291
  #
  #Estimation Method:               MLE
  #
  #Data:                            Manganese.ppb
  #
  #Censoring Variable:              Censored
  #
  #Sample Size:                     25
  #
  #Percent Censored:                24%

  # Now compare the MLE with the estimators based on 
  # Q-Q regression (ROS) and imputation with Q-Q regression (rROS)
  #---------------------------------------------------------------

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored))$parameters
  # meanlog    sdlog 
  #2.215905 1.356291

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored, 
    method = "ROS"))$parameters
  # meanlog    sdlog 
  #2.293742 1.283635

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored, 
    method = "rROS"))$parameters
  # meanlog    sdlog 
  #2.298656 1.238104

  #----------

  # The method used to estimate quantiles for a Q-Q plot is 
  # determined by the argument prob.method.  For the functions
  # enormCensored and elnormCensored, for any estimation 
  # method that involves Q-Q regression, the default value of 
  # prob.method is "hirsch-stedinger" and the default value for the 
  # plotting position constant is plot.pos.con=0.375.  

  # Both Helsel (2012) and USEPA (2009) also use the Hirsch-Stedinger 
  # probability method but set the plotting position constant to 0. 

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored,
    method = "rROS", plot.pos.con = 0))$parameters
  # meanlog    sdlog 
  #2.277175 1.261431

  #----------

  # Using the same data as above, compute a confidence interval 
  # for the mean on the log-scale using the profile-likelihood
  # method.

  with(EPA.09.Ex.15.1.manganese.df,
    elnormCensored(Manganese.ppb, Censored, ci = TRUE))

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              2 5 
  #
  #Estimated Parameter(s):          meanlog = 2.215905
  #                                 sdlog   = 1.356291
  #
  #Estimation Method:               MLE
  #
  #Data:                            Manganese.ppb
  #
  #Censoring Variable:              Censored
  #
  #Sample Size:                     25
  #
  #Percent Censored:                24%
  #
  #Confidence Interval for:         meanlog
  #
  #Confidence Interval Method:      Profile Likelihood
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 1.595062
  #                                 UCL = 2.771197

Run the code above in your browser using DataLab