ou_haltlost and ou_zaplost handles lost traits and missing data.
Each of them wraps the function oupar and returns
a new function that accepts the same arguments and output the same form of result,
but takes into account lost traits and missing data. dou_haltlost and
dou_zaplost wraps the Jacobian function oujac, and
hou_haltlost and hou_zaplost wraps the Hessian function
ouhess.
ou_haltlost(parfn)dou_haltlost(jacfn)
hou_haltlost(hessfn)
ou_zaplost(parfn)
dou_zaplost(jacfn)
hou_zaplost(hessfn)
ou_haltlost and ou_zaplost returns a wrapped versions of `parfn`, which accepts the same arguments
and outputs in the same format. dou_haltlost and dou_zaplost, analogously, wraps jacfn.
hou_zaplost and hou_zaplost wraps hessfn.
A function that maps from the user-parametrisation to the underlying Gaussian parameters.
Each of them returns a vector of concatenated \((\Phi, w, V')\), where \(V'\) is the lower triangular
part of \(V\), and accepts four arguments: a vector of parameters whose length is specified
by the pardims argument to the glinv_gauss function, the branch length leading to the currently processing node,
a vector of factors with three levels indicating which dimensions are missing or lost in the mother of
the current node, and a vector of factors with the same three levels indicating missingness of the current
node.
A function that accepts the same arguments as parfn and returns the Jacobian
of parfn.
A function that accepts the same arguments as parfns and returns a list of three 3D arrays,
named Phi, w, V respectively inside the list. ((hessfn)(...))$Phi[m,i,j]
contains the cross second-order partial derivative of \(\Phi_m\) (here we treat the matrix
\(\Phi\) as a column-major-flattened vector) with respect to the \(i\)-th and\(j\)-th parameters
in the joint \((H,\theta,\Sigma_x)\) vector, and
((hessfn)(...))$w[m,i,j] and ((hessfn)(...))$V[m,i,j]
analogously contains second-order derivative of \(w_m\) and \(V'_m\).
A `missing' trait refers to a trait value whose data is missing due to data
collection problems. Fundamentally, they evolves in the same manner as other
traits. An NA entry in the data is deemed `missing'. On the other hand,
a lost trait is a trait dimension which had ceased to exists during the
evolutionary process. An NaN entry in the data indicates a `lost' trait.
Each trait dimension of each nodes, either internal or tip, are tagged with
one of the three labels: MISSING, LOST, and OK.
If the data contains an NA in the \(p\)-th dimension of the \(i\)-th tip
then \(X_pi\) is tagged MISSING. No other tags of any other nodes and dimensions
are changed in the case of missing-ness. On the other hands, the \(p\)-th dimension of
any node \(j\), regardless of whether or not it is an internal node or a tips, is
tagged LOST if and only if the \(p\)-th dimension of all tips inside
the clade started at \(j\) are NaN. Any entry that is neither tagged
LOST nor MISSING are tagged OK.
This corresponds to the biological intuition that, if a value is missing only due to data collection problems, the missingness should not influence the random walk process way up the phylogenetic tree; and this is obviously not true if the trait had ceased to exists instead.
ou_haltlost and ou_zaplost handles missing data in the same way: they
simply marginalises the unobserved dimensions in the joint Gaussian distributions of
tip data.
For lost traits, ou_haltlost assumes the followings:
In the entire branch leading to the earliest node \(j\) whose \(p\)-th dimension
is tagged LOST, the lost trait dimension does not evolve at all.
In the entire same branch, the magnitude of the \(p\)-th dimension at \(j\)'s mother node has no influence on other dimensions, in any instantaneous moments during the evolution in the branch, neither through the linear combination with the drift matrix nor the Wiener process covariance; in other words, the SDE governing the non-lost dimensions' random walk is invariant of \(j\)'s mother nodes' \(p\)-th dimension.
Therefore, ou_haltlost first set the \(p\)-th row and column of both of \(H_j\)
and the \(p\)-th row of \(Sigma_x\) to zero and marginalise out the degenerate Gaussian
dimension.
On the other hands, ou_zaplost does not assume the lost trait to stop evolving
immediately at moment when the branch leading to \(j\) starts, but, instead, simply
marginalise out the lost, non-degenerate Gaussian dimensions. This method is the same as
the one that is used in the PCMBase package.
Without paramter restriction, the following is an example usage in a call to the
glinv function. It constructs a glinv model object
which is capable of handling missing data and lost traits.
mod.full = glinv(tree, x0, my_data,
parfns = haltlost(oupar),
pardims = nparams_ou(k),
parjacs = dhaltlost(oujac),
parhess = hhaltlost(ouhess))
Note that we have the same naming convention that functions wrappers whose
nams have prefix d wraps the Jacobians, while prefix d wraps
the Hessians.
If parameter restriction is needed, then *ou_*lost should called
before any reparameterisation/restriction functions because it
expects the passed-in function parfn to accept the full \(H\)
matrix, rather than only the diagonal or lower-triangular part of it.
Example:
f = haltlost(oupar)
g = dhaltlost(oujac)
h = hhaltlost(oujac)
mod.full = glinv(tree, x0, my_data,
parfns = ou_spdH(f),
pardims = nparams_ou_spdH(k),
parjacs = dou_spdH(g),
parhess = ou_spdH(h,g))