objfun: Function Factory that Makes Objective Function for Aster Models with Random Effects

Description

A function factory that returns an R function that evaluates the objective function for fitting aster models with random effects

Usage

objfun.factory(fixed, random, response, pred, fam, root, zwz,
    famlist = fam.default(), offset, standard.deviation = FALSE,
    deriv = 0)

Value

returns an R function having one argument which is a numeric vector of all the parameters and random effects in the order

$(\alpha, b, \nu)$ for one set of variables or

$(\alpha, c, \sigma)$ for other set.

The R function returned by this function evaluates the function $p_W$

defined in the Details section and returns a list with components named

value, gradient, and hessian in case the deriv

argument is 2. The returned list has only the first two of these in case the deriv argument is 1. The returned list has only the first one of these in case the deriv argument is 0.

In case the $(\alpha, b, \nu)$ parameterization is being used and some components of $\nu$ are zero, the derivatives returned are for the terms of the objective function that are differentiable.

Arguments

fixed: the model matrix for fixed effects.
random: either a model matrix or list of model matrices. Each model matrix operates on the random effects for one variance component. See Details section for more.
response: a vector specifying the response. Length is the row dimension of all model matrices.
pred: an integer vector of length nnode determining the dependence graph of the aster model. pred[j] is the index of the predecessor of the node with index j unless the predecessor is a root node, in which case pred[j] == 0. See details section of aster for further requirements.
fam: an integer vector of length nnode determining the exponential family structure of the aster model. Each element is an index into the vector of family specifications given by the argument famlist.
root: a vector whose length is the row dimension of all model matrices. For nodes whose predecessors are root nodes specifies the value of the constant at that root node. Typically the vector having all components equal to one. Always positive integer valued. If components greater than one, then graph for that "individual" in scare quotes is actually for data for multiple individuals.
zwz: a positive definite symmetric matrix (neither symmetry nor positive definiteness is checked, but incorrect answers with no error or warning will result if condition is not met) whose dimensions are the number of random effects. Should be $Z^T W(\varphi) Z$ where $Z$ is the model matrix for random effects and $W(\,\cdot\,)$ is minus the second derivative of the saturated model aster log likelihood with respect to its unconditional canonical parameter. But it does not have to be. Use R function makezwz to make this.
famlist: a list of family specifications (see families).
offset: a vector whose length is the row dimension of all model matrices. Distinguished point in parameter space. May be missing, in which case an unspecified default is provided. See details of aster for further explanation, where it is called “origin” rather than “offset”.
standard.deviation: scalar logical. If FALSE, use the parameterization alpha, b, nu, fixed effects, random effects, and variance components, respectively. If TRUE, use the parameterization alpha, c, sigma. See Details for more about these parameterizations.
deriv: 0, 1, or 2. Number of derivatives. Zero means only the function value, one means gradient, two means Hessian.

Warning about Negative Binomial

The negative binomial and truncated negative binomial families are fundamentally incompatible with random effects. The reason is that the canonical parameter space for a one-parameter negative binomial or truncated negative binomial is the negative half line. Thus the conditional canonical parameter $\theta$ for such a node must be negative valued. The aster transform is so complicated that it is unclear what the corresponding constraint on the unconditional canonical parameter $\varphi$ is, but there is a constraint: its parameter space is not the whole real line. A normal random effect, in contrast, does have support the whole real line. It wants to make parameters that are constrained to have any real number. The code only warns about this situation, because if the random effects do not influence any negative binomial or truncated negative binomial nodes of the graph, then there would be no problem.

Warning about Individual Random Effects

The Breslow-Clayton approximation assumes the complete data log likelihood is approximately quadratic considered as a function of random effects only. This will be the case by the law of large numbers if the number of individuals is much larger than the number of random effects. Thus Geyer, et al. (2013) warn against trying to put a random effect for each individual in the model. If you do that, the code will try to fit the model, but it will take forever and no theory says the results will make any sense.

Details

See the help page for the function aster for specification of aster models. This function only fits unconditional aster models (those with default values of the aster function arguments type and origin.type.

The only difference between this function and functions aster and mlogl is that some effects (also called coefficients, also called canonical affine submodel canonical parameters) are treated as random. The unconditional canonical parameter vector of the (saturated) aster model is treated as an affine function of fixed and random effects $$\varphi = a + M \alpha + \sum_{i = 1}^k Z_i b_i$$ where $M$ and the $Z_i$ are model matrices specified by the arguments fixed and random, where $\alpha$ is a vector of fixed effects and each $b_i$ is a vector of random effects that are assumed to be (marginally) normally distributed with mean vector zero and variance matrix $\nu_i$ times the identity matrix. The random effects vectors $b_i$ are assumed independent.

In what follows, $Z$ is the single matrix produced by cbind'ing the $Z_i$, and $b$ is the single vector produced by c'ing the $b_i$, and $\nu$ is the single vector produced by c'ing the $\nu_i$, and $D$ is the variance-covariance matrix of $b$, which is diagonal with all diagonal components equal to some component of $\nu$.

We can write $$D = \sum_k \nu_k E_k$$ where the nus are unknown parameters to estimate and the Es are known diagonal matrices whose components are zero-or-one-valued having the property that they sum to the identity matrix and multiply to the zero matrix (so each component of $b$ is exactly one component of $\nu$).

The R function returned by this function evaluates $$p_W(\alpha, b, \nu) = - l(\varphi) + \tfrac{1}{2} b^T D^{-1} b + \tfrac{1}{2} \log\mathop{\rm det}\left(Z^T W Z D + \text{Id} \right)$$ where $l$ is the log likelihood for the saturated aster model specified by arguments pred, fam, and famlist, where $\varphi$ is given by the displayed equation above, and where $Z^T W Z$ is argument zwz. This is equation (7) in Geyer, et al. (2013).

The displayed equation above only defines $p_W$ when all of the variance components (components of $\nu$ and diagonal components of $D$) are strictly positive because otherwise $D$ is not invertible. In order to be lower semicontinuous, the R function returned by this function returns Inf whenever any component of $\nu$ is nonpositive, except it returns zero if $b_i = 0$ whenever the corresponding diagonal element of $D$ is zero.

When argument standard.deviation is TRUE, the arguments to the objective function are replaced by the change-of-variables $$\nu = \sigma^2$$ and $$b = A c$$ where the first equation works componentwise (as in R vector arithmetic), components of the vector $\sigma$ are allowed to be negative, and $D = A^2$ so $$A = \sum_k \sigma_k E_k$$

When the $(\alpha. c, \sigma)$ variables are used, $p_W$ is continuous and infinitely differentiable for all values of these variables.

When the $(\alpha. b, \nu)$ variables are used, $p_W$ is continuous and infinitely differentiable for all values of $\alpha$ and $b$ but only strictly positive values $\nu$. When some components of $\nu$ are zero, then subderivatives exist, but not derivatives (Geyer, et al., 2013).

References

Breslow, N. E., and Clayton, D. G. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88, 9--25. tools:::Rd_expr_doi("10.1080/01621459.1993.10594284").

Geyer, C. J., Ridley, C. E., Latta, R. G., Etterson, J. R., and Shaw, R. G. (2013) Local Adaptation and Genetic Effects on Fitness: Calculations for Exponential Family Models with Random Effects. Annals of Applied Statistics, 7, 1778--1795. tools:::Rd_expr_doi("10.1214/13-AOAS653").

Examples

Run this code

library(aster)
data(radish)
pred <- c(0,1,2)
fam <- c(1,3,2)
rout <- reaster(resp ~ varb + fit : (Site * Region),
    list(block = ~ 0 + fit : Block, pop = ~ 0 + fit : Pop),
    pred, fam, varb, id, root, data = radish)
objfun <- with(rout, objfun.factory(fixed, random, response,
    obj$pred, obj$fam, as.vector(obj$root), zwz))
theta.hat <- with(rout, c(alpha, b, nu))
all.equal(objfun(theta.hat)$value, rout$deviance / 2)

Run the code above in your browser using DataLab