grouped: Regression for Grouped Data - Coarse Data

Description

grouped is used to fit regression models for grouped or coarse data under the assumption that the data are Coarsened At Random.

Usage

grouped(formula, link = c("identity", "log", "logit"), 
            distribution = c("normal", "t", "logistic"), data,
            subset, na.action, str.values, df = NULL, iter = 3, ...)

Arguments

formula

a two-sided formula describing the model structure. In the left-hand side, a two-column response matrix must be supplied, specifying the lower and upper limits (1st and 2nd column, respectively) of the interval in which

link

the link function under which the underlying response variable follows the distribution given by the distribution argument. Available choices are "identity", "log" and "logit".

distribution

the assumed distribution for the true latent response variable. Available choices are "normal", "t" and "logistic". See Details for more info.

data

an optional data.frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which grouped is

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs.

str.values

a numeric vector of starting values.

a scalar numeric value denoting the degrees of freedom when the underlying distribution for the response variable is assumed to be Student's-$t$.

iter

the number of extra times to call optim in case the first optimization has not converged.

...

additional arguments; currently none is used.

Value

an object of class grouped is a list with the following components:
coefficientsthe estimated coefficients, including the standard deviation $\sigma$.
hessianthe approximate Hessian matrix at convergence returned by optim.
fittedthe fitted values.
detailsa list with components: (i) X the design matrix, (ii) y the response data matrix, (iii) convergence the convergence identifier returned by optim, (iv) logLik the value of the log-likelihood at convergence, (v) k the number of outer iterations used, (vi) n the sample size, (vii) df the degrees of freedom; NULL except for the t distribution, (viii) link the link function used, (ix) distribution the distribution assumed for the true latent response variable and (x) max.sc the maximum absolute value of the score vector at convergence.
callthe matched call.

Details

Let $Z_i, i = 1, \ldots, n$ be a random sample from a response variable of interest. In many problems one can think of the sample space $S_i$ of $Z_i$ as being partitioned into a number of groups; one then observes not the exact value of $Z_i$ but the group into which it falls. Data generated in this way are called grouped (Heitjan, 1989). The function grouped and this package are devoted in the analysis of such data in the case the data are Coarsened At Random (Heitjan and Rubin, 1991). The framework we use assumes a latent variable $Z_i$ which is coarsely measured and for which we only know $Y_{li}$ and $Y_{ui}$, i.e., the interval in which $Z_i$ lies. Given some covariates $X_i$, $Z_i|X_i$ may assume either a Normal, a Logistic or (generalized) Student's-t distribution. In addition three link functions are available for greater flexibility. In particular, the likelihood is of the following form

L (β, σ) = \prod_{i = 1}^{n} F ([y_{u i}^{*} - x_{i}^{t} β] / σ) - F ([y_{l i}^{*} - x_{i}^{t} β] / σ),

where $F(\cdot)$ denotes the cdf of the assumed distribution given by the argument distribution and $y_{li}^* = \phi(y_{li})$, where $\phi(\cdot)$ denotes the link function, and $y_{ui}^*$ is defined analogously. An interesting example of coarse data is the various quality of life indexes. The observed value of such indexes can be thought of as a rounded version of the true latent quality of life that the index attempts to capture. Applications of this approach can be found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various other examples of grouped and coarse data can be found in Heitjan (1989; 1993).

References

Heitjan, D. (1989) Inference from grouped continuous data: A review (with discussion). Statistical Science, 4, 164--183. Heitjan, D. (1993) Ignorability and coarse data: some biomedical examples. Biometrics, 49, 1099--1109. Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Annals of Statistics, 19, 2244--2253. Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2007) The logistic-transform for bounded outcome scores. Biostatistics, 8, 72--85. Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2006) Power and sample size calculations for discrete bounded outcomes. Statistics in Medicine, 25, 4241--4252.

Examples

Run this code

grouped(cbind(lo, up) ~ treat * x, link = "logit", data = Sdata)
    
grouped(equispaced(r, n) ~ x1 * x2, link = "logit", data = Seeds)

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Width"]
index <- cbind(seq(0.05, 0.55, 0.1), seq(0.15, 0.65, 0.1)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Length"]
index <- cbind(seq(0.95, 1.75, 0.2), seq(1.15, 1.95, 0.2)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))

Run the code above in your browser using DataLab

Data engineering and BI courses are free!