Compute the value(s) of an objective for one or more Box-Cox power transformations, or to compute an optimal power transformation based on a specified objective, based on Type I censored data.
boxcoxCensored(x, censored, censoring.side = "left",
lambda = {if (optimize) c(-2, 2) else seq(-2, 2, by = 0.5)}, optimize = FALSE,
objective.name = "PPCC", eps = .Machine$double.eps,
include.x.and.censored = TRUE, prob.method = "michael-schucany",
plot.pos.con = 0.375)
a numeric vector of positive numbers.
Missing (NA
), undefined (NaN
), and infinite (-Inf, Inf
)
values are allowed but will be removed.
numeric or logical vector indicating which values of x
are censored. This must be the
same length as x
. If the mode of censored
is "logical"
, TRUE
values
correspond to elements of x
that are censored, and FALSE
values correspond to
elements of x
that are not censored. If the mode of censored
is "numeric"
,
it must contain only 1
's and 0
's; 1
corresponds to TRUE
and
0
corresponds to FALSE
. Missing (NA
) values are allowed but will be removed.
character string indicating on which side the censoring occurs. The possible values are
"left"
(the default) and "right"
.
numeric vector of finite values indicating what powers to use for the
Box-Cox transformation. When optimize=FALSE
, the default value is
lambda=seq(-2, 2, by=0.5)
. When optimize=TRUE
, lambda
must be a vector with two values indicating the range over which the
optimization will occur and the range of these two values must include 1.
In this case, the default value is lambda=c(-2, 2)
.
logical scalar indicating whether to simply evalute the objective function at the
given values of lambda
(optimize=FALSE
; the default), or to compute
the optimal power transformation within the bounds specified by
lambda
(optimize=TRUE
).
character string indicating what objective to use. The possible values are
"PPCC"
(probability plot correlation coefficient; the default),
"Shapiro-Wilk"
(the Shapiro-Wilk goodness-of-fit statistic), and
"Log-Likelihood"
(the log-likelihood function).
finite, positive numeric scalar. When the absolute value of lambda
is less
than eps
, lambda is assumed to be 0 for the Box-Cox transformation.
The default value is eps=.Machine$double.eps
.
logical scalar indicating whether to include the finite, non-missing values of
the argument x
and the corresponding values of censored
with the
returned object. The default value is include.x.and.censored=TRUE
.
for multiply censored data,
character string indicating what method to use to compute the plotting positions
(empirical probabilities) when
objective.name="PPCC"
. Possible values are:
"kaplan-meier"
(product-limit method of Kaplan and Meier (1958)),
"modified kaplan-meier"
(same as "kaplan-meier"
with the maximum value included),
"nelson"
(hazard plotting method of Nelson (1972)),
"michael-schucany"
(generalization of the product-limit method due to Michael and Schucany (1986)), and
"hirsch-stedinger"
(generalization of the product-limit method due to Hirsch and Stedinger (1987)).
The default value is prob.method="michael-schucany"
.
The "nelson"
method is only available for censoring.side="right"
, and
the "modified kaplan-meier"
is only available for censoring.side="left"
.
See the DETAILS section for more explanation.
This argument is ignored if objective.name
is not equal to "PPCC"
and/or the data are singly censored.
for multiply censored data,
numeric scalar between 0 and 1 containing the value of the plotting position
constant when objective.name="PPCC"
.
The default value is plot.pos.con=0.375
. See the DETAILS section
for more information.
This argument is used only if prob.method
is equal to
"michael-schucany"
or "hirsch-stedinger"
.
This argument is ignored if objective.name
is not equal to "PPCC"
and/or the data are singly censored.
boxcoxCensored
returns a list of class "boxcoxCensored"
containing the results.
See the help file for boxcoxCensored.object
for details.
Two common assumptions for several standard parametric hypothesis tests are:
The observations all come from a normal distribution.
The observations all come from distributions with the same variance.
For example, the standard one-sample t-test assumes all the observations come from the same normal distribution, and the standard two-sample t-test assumes that all the observations come from a normal distribution with the same variance, although the mean may differ between the two groups.
When the original data do not satisfy the above assumptions, data transformations
are often used to attempt to satisfy these assumptions.
Box and Cox (1964) presented a formalized method for deciding on a data
transformation. Given a random variable
|
= | |
|
where boxcoxTransform
for more information on data
transformations.
Box and Cox (1964) proposed choosing the appropriate value of
Shumway et al. (1989) investigated extending the method of Box and Cox (1964) to the case of Type I censored data, motivated by the desire to produce estimated means and confidence intervals for air monitoring data that included censored values.
In the case when optimize=TRUE
, the function boxcoxCensored
calls the
R function nlminb
to minimize the negative value of the
objective (i.e., maximize the objective) over the range of possible values of
lambda
. The starting value for
the optimization is always
The next section explains assumptions and notation, and the section after that
explains how the objective is computed for the various options for
objective.name
.
Assumptions and Notation
Let
Let
Note that in this case the quantity
Finally, let
We assume that there exists some value of
|
= | |
|
(
Note that for the censored observations, Equation (4) becomes:
|
= | |
|
where
Computing the Objective
Objective Based on Probability Plot Correlation Coefficient (objective.name="PPCC"
)
When objective.name="PPCC"
, the objective is computed as the value of the
normal probability plot correlation coefficient based on the transformed data
(see the description of the Probability Plot Correlation Coefficient (PPCC)
goodness-of-fit test in the help file for gofTestCensored
). That is,
the objective is the correlation coefficient for the normal
quantile-quantile plot for the transformed data.
Large values of the PPCC tend to indicate a good fit to a normal distribution.
Objective Based on Shapiro-Wilk Goodness-of-Fit Statistic (objective.name="Shapiro-Wilk"
)
When objective.name="Shapiro-Wilk"
, the objective is computed as the value of
the Shapiro-Wilk goodness-of-fit statistic based on the transformed data
(see the description of the Shapiro-Wilk test in the help file for
gofTestCensored
). Large values of the Shapiro-Wilk statistic tend to
indicate a good fit to a normal distribution.
Objective Based on Log-Likelihood Function (objective.name="Log-Likelihood"
)
When objective.name="Log-Likelihood"
, the objective is computed as the value
of the log-likelihood function. Assuming the transformed observations in
Equation (4) above come from a normal distribution with mean
For Type I left censored data, the likelihood function is given by:
Similarly, for Type I right censored data, the likelihood function is given by:
For a fixed value of enormCensored
).
Thus, when optimize=TRUE
, Equation (6) or (10) is maximized by iteratively
solving for optimize=FALSE
, the value of the objective is computed by using
Equation (6) or (10), using the values of lambda
, and using the MLEs of
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers, Second Edition. Lewis Publishers, Boca Raton, FL.
Box, G.E.P., and D.R. Cox. (1964). An Analysis of Transformations (with Discussion). Journal of the Royal Statistical Society, Series B 26(2), 211--252.
Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, pp.50--59.
Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York, pp.47-53.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY.
Hinkley, D.V., and G. Runger. (1984). The Analysis of Transformed Data (with Discussion). Journal of the American Statistical Association 79, 302--320.
Hoaglin, D.C., F.M. Mosteller, and J.W. Tukey, eds. (1983). Understanding Robust and Exploratory Data Analysis. John Wiley and Sons, New York, Chapter 4.
Hoaglin, D.C. (1988). Transformations in Everyday Experience. Chance 1, 40--45.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions, Second Edition. John Wiley and Sons, New York, p.163.
Johnson, R.A., and D.W. Wichern. (2007). Applied Multivariate Statistical Analysis, Sixth Edition. Pearson Prentice Hall, Upper Saddle River, NJ, pp.192--195.
Shumway, R.H., A.S. Azari, and P. Johnson. (1989). Estimating Mean Concentrations Under Transformations for Environmental Data With Detection Limits. Technometrics 31(3), 347--356.
Stoline, M.R. (1991). An Examination of the Lognormal and Box and Cox Family of Transformations in Fitting Environmental Data. Environmetrics 2(1), 85--106.
van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences, 2nd Edition. John Wiley & Sons, New York.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapter 13.
boxcoxCensored.object
, plot.boxcoxCensored
,
print.boxcoxCensored
,
boxcox
, Data Transformations, Goodness-of-Fit Tests.
# NOT RUN {
# Generate 15 observations from a lognormal distribution with
# mean=10 and cv=2 and censor the observations less than 2.
# Then generate 15 more observations from this distribution and
# censor the observations less than 4.
# Then Look at some values of various objectives for various transformations.
# Note that for both the PPCC objective the optimal value is about -0.3,
# whereas for the Log-Likelihood objective it is about 0.3.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x.1 <- rlnormAlt(15, mean = 10, cv = 2)
censored.1 <- x.1 < 2
x.1[censored.1] <- 2
x.2 <- rlnormAlt(15, mean = 10, cv = 2)
censored.2 <- x.2 < 4
x.2[censored.2] <- 4
x <- c(x.1, x.2)
censored <- c(censored.1, censored.2)
#--------------------------
# Using the PPCC objective:
#--------------------------
boxcoxCensored(x, censored)
#Results of Box-Cox Transformation
#Based on Type I Censored Data
#---------------------------------
#
#Objective Name: PPCC
#
#Data: x
#
#Censoring Variable: censored
#
#Censoring Side: left
#
#Censoring Level(s): 2 4
#
#Sample Size: 30
#
#Percent Censored: 26.7%
#
# lambda PPCC
# -2.0 0.8954683
# -1.5 0.9338467
# -1.0 0.9643680
# -0.5 0.9812969
# 0.0 0.9776834
# 0.5 0.9471025
# 1.0 0.8901990
# 1.5 0.8187488
# 2.0 0.7480494
boxcoxCensored(x, censored, optimize = TRUE)
#Results of Box-Cox Transformation
#Based on Type I Censored Data
#---------------------------------
#
#Objective Name: PPCC
#
#Data: x
#
#Censoring Variable: censored
#
#Censoring Side: left
#
#Censoring Level(s): 2 4
#
#Sample Size: 30
#
#Percent Censored: 26.7%
#
#Bounds for Optimization: lower = -2
# upper = 2
#
#Optimal Value: lambda = -0.3194799
#
#Value of Objective: PPCC = 0.9827546
#-----------------------------------
# Using the Log-Likelihodd objective
#-----------------------------------
boxcoxCensored(x, censored, objective.name = "Log-Likelihood")
#Results of Box-Cox Transformation
#Based on Type I Censored Data
#---------------------------------
#
#Objective Name: Log-Likelihood
#
#Data: x
#
#Censoring Variable: censored
#
#Censoring Side: left
#
#Censoring Level(s): 2 4
#
#Sample Size: 30
#
#Percent Censored: 26.7%
#
# lambda Log-Likelihood
# -2.0 -95.38785
# -1.5 -84.76697
# -1.0 -75.36204
# -0.5 -68.12058
# 0.0 -63.98902
# 0.5 -63.56701
# 1.0 -66.92599
# 1.5 -73.61638
# 2.0 -82.87970
boxcoxCensored(x, censored, objective.name = "Log-Likelihood",
optimize = TRUE)
#Results of Box-Cox Transformation
#Based on Type I Censored Data
#---------------------------------
#
#Objective Name: Log-Likelihood
#
#Data: x
#
#Censoring Variable: censored
#
#Censoring Side: left
#
#Censoring Level(s): 2 4
#
#Sample Size: 30
#
#Percent Censored: 26.7%
#
#Bounds for Optimization: lower = -2
# upper = 2
#
#Optimal Value: lambda = 0.3049744
#
#Value of Objective: Log-Likelihood = -63.2733
#----------
# Plot the results based on the PPCC objective
#---------------------------------------------
boxcox.list <- boxcoxCensored(x, censored)
dev.new()
plot(boxcox.list)
#Look at QQ-Plots for the candidate values of lambda
#---------------------------------------------------
plot(boxcox.list, plot.type = "Q-Q Plots", same.window = FALSE)
#==========
# Clean up
#---------
rm(x.1, censored.1, x.2, censored.2, x, censored, boxcox.list)
graphics.off()
# }
Run the code above in your browser using DataLab