aldvmm: Fitting Adjusted Limited Dependent Variable Mixture Models

Description

The function aldvmm fits adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package aldvmm uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) using normal component distributions and a multinomial logit model of probabilities of component membership.

Usage

aldvmm(
  formula,
  data,
  subset = NULL,
  psi,
  ncmp = 2,
  dist = "normal",
  optim.method = NULL,
  optim.control = list(trace = FALSE),
  optim.grad = TRUE,
  init.method = "zero",
  init.est = NULL,
  init.lo = NULL,
  init.hi = NULL,
  se.fit = FALSE,
  model = TRUE,
  level = 0.95,
  na.action = "na.omit"
)

Value

aldvmm

returns an object of class "aldvmm". An object of class "aldvmm" is a list containing the following objects.

coef

a numeric vector of parameter estimates.

hessian

a numeric matrix object with second partial derivatives of the likelihood function.

cov

a numeric matrix object with covariances of parameters.

n

a scalar representing the number of observations that were used in the estimation.

k

a scalar representing the number of components that were mixed.

df.null

an integer value of the residual degrees of freedom of a null model including intercepts and standard errors.

df.residual

an integer value of the residual degrees of freedom..

iter

an integer value of the number of iterations used in optimization.

convergence

an integer value indicating convergence. "0" indicates successful completion.

gof

a list including the following elements.

ll: a numeric value of the negative log-likelihood \(-ll\).
aic: a numeric value of the Akaike information criterion \(AIC = 2n_{par} - 2ll\).

bic

a numeric value of the Bayesian information criterion \(BIC = n_{par}*log(n_{obs}) - 2ll\).

mse

a numeric value of the mean squared error \(\sum{(y - \hat{y})^2}/(n_{obs} - n_{par})\).

mae

a numeric value of the mean absolute error \(\sum{|y - \hat{y}|}/(n_{obs} - n_{par})\).

pred

a list including the following elements.

y: a numeric vector of observed outcomes in 'data'.

yhat

a numeric vector of fitted values.

res

a numeric vector of residuals.

se.fit

a numeric vector of the standard error of fitted values.

lower.fit

a numeric vector of 95% lower confidence limits of fitted values.

upper.fit

a numeric vector of 95% upper confidence limits of fitted values

prob

a numeric matrix of expected probabilities of group membership per individual in 'data'.

init

a list including the following elements.

est: a numeric vector of initial parameter estimates.

lo

a numeric vector of lower limits of parameter estimates.

hi

a numeric vector of upper limits of parameter estimates.

call

a character value including the model call captured by match.call.

formula

an object of class "formula" supplied to argument 'formula'.

terms

a list of objects of class "terms" for the model of component means ("beta"), probabilities of component membership ("delta") and the full model ("full").

contrasts

a nested list of character values showing contrasts of factors used in models of component means ("beta") and probabilities of component membership ("delta").

data

a data frame created by model.frame including estimation data with additional attributes.

psi

a numeric vector with the minimum and maximum utility below 1 in 'data'.

dist

a character value indicating the used component distributions.

label

a list including the following elements.

lcoef: a character vector of labels for objects including results on distributions (default "beta") and the probabilities of component membership (default "delta").
lcpar: a character vector of labels for objects including constant distribution parameters (default "sigma" for dist = "normal").

lcmp

a character value of the label for objects including results on different components (default "Comp")

lvar

a list including 2 character vectors of covariate names for model parameters of distributions ("beta") and the multinomial logit ("delta").

optim.method

a character value of the used optimr method.

level

a numeric value of the confidence level used for reporting.

na.action

an object of class "omit" extracted from the "na.action" attribute of the data frame created by model.frame in the preparation of model matrices.

Arguments

formula: an object of class "formula" with a symbolic description of the model to be fitted. The model formula takes the form y ~ x1 + x2 | x1 + x4, where the | delimiter separates the model for expected values of normal distributions (left) and the multinomial logit model of probabilities of component membership (right).
data: a data frame, list or environment (or object coercible to a data frame by as.data.frame) including data on outcomes and explanatory variables in 'formula'.
subset: an optional numeric vector of row indices of the subset of the model matrix used in the estimation. 'subset' can be longer than the number of rows in data and include repeated values for re-sampling purposes.
psi: a numeric vector of minimum and maximum possible utility values smaller than or equal to 1 (e.g. c(-0.594, 0.883)). The potential gap between the maximum value and 1 represents an area with zero density in the value set from which utilities were obtained. The order of the minimum and maximum limits in 'psi' does not matter.
ncmp: a numeric value of the number of components that are mixed. The default value is 2. A value of 1 represents a tobit model with a gap between 1 and the maximum value in 'psi'.
dist: an optional character value of the distribution used in the components. In this release, only the normal distribution is available, and the default value is set to "normal".
optim.method: an optional character value of one of the following optimr methods: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlminb", "Rcgmin", "Rvmmin" and "hjn". The default method is "BFGS". The method "L-BFGS-B" is used when lower and/or upper constraints are set using 'init.lo' and 'init.hi'. The method "nlm" cannot be used in the 'aldvmm' package.
optim.control: an optional list of optimr control parameters.
optim.grad: an optional logical value indicating if an analytical gradient should be used in optimr methods that can use this information. The default value is TRUE. If 'optim.grad' is set to FALSE, a finite difference approximation is used.
init.method: an optional character value indicating the method for obtaining initial values. The following values are available: "zero", "random", "constant" and "sann". The default value is "zero".
init.est: an optional numeric vector of user-defined initial values. User-defined initial values override the 'init.method' argument. Initial values have to follow the same order as parameter estimates in the return value 'coef'.
init.lo: an optional numeric vector of user-defined lower limits for constrained optimization. When 'init.lo' is not NULL, the optimization method "L-BFGS-B" is used. Lower limits of parameters have to follow the same order as parameter estimates in the return value 'coef'.
init.hi: an optional numeric vector of user-defined upper limits for constrained optimization. When 'init.hi' is not NULL, the optimization method "L-BFGS-B" is used. Upper limits of parameters have to follow the same order as parameter estimates in the return value 'coef'.
se.fit: an optional logical value indicating whether standard errors of fitted values are calculated. The default value is FALSE.
model: an optional logical value indicating whether the estimation data frame is returned in the output object. The default value is TRUE.
level: a numeric value of the significance level for confidence bands of fitted values. The default value is 0.95.
na.action: a character value passed to argument 'na.action' of the function model.frame in the preparation of the model matrix. The default value is "na.omit".

Details

aldvmm fits an adjusted limited dependent variable mixture model using the likelihood and expected value functions from Hernandez Alava and Wailoo (2015). The model accounts for latent classes, multi-modality, minimum and maximum utility values and potential gaps between 1 and the next smaller utility value. Adjusted limited dependent variable mixture models combine multiple component distributions with a multinomial logit model of the probabilities of component membership. The standard deviations of normal distributions are estimated and reported as log-transformed values which enter the likelihood function as exponentiated values to ensure non-negative values.

The minimum utility and the largest utility smaller than or equal to 1 are supplied in the argument 'psi'. The number of distributions/components that are mixed is set by the argument 'ncmp'. When 'ncmp' is set to 1 the procedure estimates a tobit model with a gap between 1 and the maximum utility value in 'psi'. The current version only allows finite mixtures of normal distributions.

The 'formula' object can include a | delimiter to separate formulae for expected values in components (left) and the multinomial logit model of probabilities of group membership (right). If no | delimiter is used, the same formula will be used for expected values in components and the multinomial logit of the probabilities of component membership.

aldvmm uses optimr for maximum likelihood estimation of model parameters. The argument 'optim.method' accepts the following methods: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlminb", "Rcgmin", "Rvmmin" and "hjn". The default method is "BFGS". The method "nlm" cannot be used in aldvmm because it requires a different implementation of the likelihood function. The argument 'optim.control' accepts a list of optimr control parameters. If 'optim.grad' is set to TRUE the function optimr uses analytical gradients during the optimization procedure for all methods that allow for this approach. If 'optim.grad' is set to FALSE or a method cannot use gradients, a finite difference approximation is used. The hessian matrix at maximum likelihood parameters is approximated numerically using hessian.

'init.method' accepts four values of methods for generating initial values: "zero", "random", "constant", "sann". The method "zero" sets initial values of all parameters to 0. The method "random" draws random starting values from a standard normal distribution. The method "constant" estimates a constant-only model and uses estimates as initial values of intercepts and standard errors and 0 for all other parameters. The method "sann" estimates the full model using the simulated annealing optimization method in optim and uses parameter estimates as initial values. When user-specified initial values are supplied in 'init.est', the argument 'init.method' is ignored.

By default, aldvmm performs unconstrained optimization with upper and lower limits at -Inf and Inf. When user-defined lower and upper limits are supplied to 'init.lo' and/or 'init.hi', these default limits are replaced with the user-specified values, and the method "L-BFGS-B" is used for box-constrained optimization instead of the user defined 'optim.method'. It is possible to only set either maximum or minimum limits. When initial values supplied to 'init.est' or from default methods lie outside the limits, the in-feasible values will be set to the limits using the function bmchk.

The function aldvmm() returns the negative log-likelihood, Akaike information criterion and Bayesian information criterion. Smaller values of these measures indicate better fit.

If 'se.fit' is set to TRUE, standard errors of fitted values are calculated using the delta method. The standard errors of fitted values in the estimation data set are calculated as \(se_{fit} = \sqrt{G^{t} \Sigma G}\), where \(G\) is the gradient of a fitted value with respect to changes of parameter estimates, and \(\Sigma\) is the estimated covariance matrix of parameters (Dowd et al., 2014). The standard errors of predicted values in new data sets are calculated as \(se_{pred} = \sqrt{MSE + G^{t} \Sigma G}\), where \(MSE\) is the mean squared error of fitted versus observed outcomes in the original estimation data (Whitmore, 1986).

The generic function summary can be used to obtain or print a summary of the results. The generic function predict can be used to obtain predicted values and standard errors of predictions in new data.

References

Alava, M. H. and Wailoo, A. (2015) Fitting adjusted limited dependent variable mixture models to EQ-5D. The Stata Journal, 15(3), 737--750. tools:::Rd_expr_doi("10.1177/1536867X1501500307")

Dowd, B. E., Greene, W. H., and Norton, E. C. (2014) Computation of standard errors. Health services research, 49(2), 731--750. tools:::Rd_expr_doi("10.1111/1475-6773.12122")

Whitmore, G. A. (1986) Prediction limits for a univariate normal observation. The American Statistician, 40(2), 141--143. tools:::Rd_expr_doi("10.1080/00031305.1986.10475378")

Examples

Run this code

data(utility)

 fit <- aldvmm(eq5d ~ age + female | 1,
               data = utility,
               psi = c(0.883, -0.594),
               ncmp = 2)

 summary(fit)

 yhat <- predict(fit)

Run the code above in your browser using DataLab