A function factory that returns an R function that evaluates the objective function for fitting aster models with random effects
objfun.factory(fixed, random, response, pred, fam, root, zwz,
famlist = fam.default(), offset, standard.deviation = FALSE,
deriv = 0)
returns an R function having one argument which is a numeric vector of all the parameters and random effects in the order
\((\alpha, b, \nu)\) for one set of variables or
\((\alpha, c, \sigma)\) for other set.
The R function returned by this function evaluates the function \(p_W\)
defined in the Details section and returns a list with components named
value
, gradient
, and hessian
in case the deriv
argument is 2. The returned list has only the first two of these in case
the deriv
argument is 1.
The returned list has only the first one of these in case
the deriv
argument is 0.
In case the \((\alpha, b, \nu)\) parameterization is being used and some components of \(\nu\) are zero, the derivatives returned are for the terms of the objective function that are differentiable.
the model matrix for fixed effects.
either a model matrix or list of model matrices. Each model matrix operates on the random effects for one variance component. See Details section for more.
a vector specifying the response. Length is the row dimension of all model matrices.
an integer vector of length nnode
determining
the dependence graph of the aster model. pred[j]
is
the index of the predecessor of
the node with index j
unless the predecessor is a root
node, in which case pred[j] == 0
. See details section
of aster
for further requirements.
an integer vector of length nnode
determining
the exponential family structure of the aster model. Each element
is an index into the vector of family specifications given by
the argument famlist
.
a vector whose length is the row dimension of all model matrices. For nodes whose predecessors are root nodes specifies the value of the constant at that root node. Typically the vector having all components equal to one. Always positive integer valued. If components greater than one, then graph for that "individual" in scare quotes is actually for data for multiple individuals.
a positive definite symmetric matrix (neither symmetry nor
positive definiteness is checked, but incorrect answers with no error
or warning will result if condition is not met)
whose dimensions are the number of random effects. Should be
\(Z^T W(\varphi) Z\) where \(Z\) is
the model matrix for random effects and \(W(\,\cdot\,)\) is
minus the second derivative of the saturated model aster log likelihood
with respect to its unconditional canonical parameter. But it does
not have to be. Use R function makezwz
to make this.
a list of family specifications (see families
).
a vector whose length is the row dimension of all model
matrices. Distinguished point in parameter space. May be missing,
in which case an unspecified default is provided. See details of
aster
for further explanation, where it is called
“origin” rather than “offset”.
scalar logical. If FALSE
, use the
parameterization alpha
, b
, nu
, fixed effects,
random effects, and variance components, respectively. If TRUE
,
use the parameterization alpha
, c
, sigma
.
See Details for more about these parameterizations.
0, 1, or 2. Number of derivatives. Zero means only the function value, one means gradient, two means Hessian.
The negative binomial and truncated negative binomial families are fundamentally incompatible with random effects. The reason is that the canonical parameter space for a one-parameter negative binomial or truncated negative binomial is the negative half line. Thus the conditional canonical parameter \(\theta\) for such a node must be negative valued. The aster transform is so complicated that it is unclear what the corresponding constraint on the unconditional canonical parameter \(\varphi\) is, but there is a constraint: its parameter space is not the whole real line. A normal random effect, in contrast, does have support the whole real line. It wants to make parameters that are constrained to have any real number. The code only warns about this situation, because if the random effects do not influence any negative binomial or truncated negative binomial nodes of the graph, then there would be no problem.
The Breslow-Clayton approximation assumes the complete data log likelihood is approximately quadratic considered as a function of random effects only. This will be the case by the law of large numbers if the number of individuals is much larger than the number of random effects. Thus Geyer, et al. (2013) warn against trying to put a random effect for each individual in the model. If you do that, the code will try to fit the model, but it will take forever and no theory says the results will make any sense.
See the help page for the function aster
for specification
of aster models. This function only fits unconditional aster models
(those with default values of the aster
function arguments
type
and origin.type
.
The only difference between this function and functions aster
and mlogl
is
that some effects (also called coefficients, also called canonical affine
submodel canonical parameters) are treated as random.
The unconditional canonical
parameter vector of the (saturated) aster model is treated as
an affine function of fixed and random effects
$$\varphi = a + M \alpha + \sum_{i = 1}^k Z_i b_i$$
where \(M\) and the \(Z_i\) are model matrices specified by
the arguments fixed
and random
, where \(\alpha\)
is a vector of
fixed effects and each \(b_i\) is a vector of random
effects that are assumed to be (marginally) normally distributed with
mean vector zero and variance matrix \(\nu_i\) times
the identity matrix. The random effects vectors \(b_i\) are
assumed independent.
In what follows, \(Z\) is the single matrix produced
by cbind
'ing the \(Z_i\),
and \(b\) is the single vector produced by c
'ing
the \(b_i\),
and \(\nu\) is the single vector produced by c
'ing the
\(\nu_i\), and \(D\) is the variance-covariance matrix
of \(b\), which is diagonal with all diagonal components equal to
some component of \(\nu\).
We can write $$D = \sum_k \nu_k E_k$$ where the nus are unknown parameters to estimate and the Es are known diagonal matrices whose components are zero-or-one-valued having the property that they sum to the identity matrix and multiply to the zero matrix (so each component of \(b\) is exactly one component of \(\nu\)).
The R function returned by this function evaluates
$$p_W(\alpha, b, \nu) =
- l(\varphi) + \tfrac{1}{2} b^T D^{-1} b + \tfrac{1}{2}
\log\mathop{\rm det}\left(Z^T W Z D + \text{Id} \right)$$
where \(l\) is the log likelihood for the saturated aster model specified
by arguments pred
, fam
, and famlist
,
where \(\varphi\) is given by the displayed equation above,
and where \(Z^T W Z\) is argument zwz
.
This is equation (7) in Geyer, et al. (2013).
The displayed equation above only defines \(p_W\) when all of the
variance components (components of \(\nu\) and diagonal components
of \(D\)) are strictly positive because otherwise \(D\) is not
invertible. In order to be lower semicontinuous, the R function returned
by this function returns Inf
whenever any component of \(\nu\)
is nonpositive, except it returns zero if \(b_i = 0\) whenever
the corresponding diagonal element of \(D\) is zero.
When argument standard.deviation
is TRUE
, the arguments
to the objective function are replaced by the change-of-variables
$$\nu = \sigma^2$$ and $$b = A c$$
where the first equation works componentwise (as in R vector arithmetic),
components of the vector \(\sigma\) are allowed to be
negative, and \(D = A^2\) so
$$A = \sum_k \sigma_k E_k$$
When the \((\alpha. c, \sigma)\) variables are used, \(p_W\) is continuous and infinitely differentiable for all values of these variables.
When the \((\alpha. b, \nu)\) variables are used, \(p_W\) is continuous and infinitely differentiable for all values of \(\alpha\) and \(b\) but only strictly positive values \(\nu\). When some components of \(\nu\) are zero, then subderivatives exist, but not derivatives (Geyer, et al., 2013).
Breslow, N. E., and Clayton, D. G. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88, 9--25. tools:::Rd_expr_doi("10.1080/01621459.1993.10594284").
Geyer, C. J., Ridley, C. E., Latta, R. G., Etterson, J. R., and Shaw, R. G. (2013) Local Adaptation and Genetic Effects on Fitness: Calculations for Exponential Family Models with Random Effects. Annals of Applied Statistics, 7, 1778--1795. tools:::Rd_expr_doi("10.1214/13-AOAS653").
library(aster)
data(radish)
pred <- c(0,1,2)
fam <- c(1,3,2)
rout <- reaster(resp ~ varb + fit : (Site * Region),
list(block = ~ 0 + fit : Block, pop = ~ 0 + fit : Pop),
pred, fam, varb, id, root, data = radish)
objfun <- with(rout, objfun.factory(fixed, random, response,
obj$pred, obj$fam, as.vector(obj$root), zwz))
theta.hat <- with(rout, c(alpha, b, nu))
all.equal(objfun(theta.hat)$value, rout$deviance / 2)
Run the code above in your browser using DataLab