Controlling response updating: Updating the data may be tricky when the response specified in a formula is not simply the name of a variable in the data. For example,
if the response was specified as I(foo^2) the variable foo is not what simulate.HLfit will simulate, so foo should not be updated with such simulation results, yet this is what should be updated in the data. For some time spaMM has handled such cases by using an alternative way to provide updated response information, but this has some limitations. So spaMM now update the data after checking that this is correct, which the consequence that when response updating is needed (notably, for bootstrap procedures), the response should preferably be specified as the name of a variable in the data, rather than a more complicated expression.
However, in some cases, dynamic evaluation of the response variable may be helpful. For example, for bootstrapping hurdle models, the zero-truncated response may be specified as I(count[presence>0] <- NA; count) (where both the zero-truncated count and binary presence variables are both updated by the bootstrap simulation). In that case the names of the two variables to be updated is provided by setting (say)
<fit object>$respNames <- c("presence", "count")
for an hurdle model fit as a bivariate-response model, with first submodel for presence/absence, and second submodel for zero-truncated response. A full example is developed in the “Gentle introduction” to spaMM (
https://gitlab.mbb.univ-montp2.fr/francois/spamm-ref/-/blob/master/vignettePlus/spaMMintro.pdf).
Alternatively for univariate-response fits, use
<fit object>$respName <- "count"
Controlling formula updating: Early versions of spaMM's update method relied on stats::update.formula whose results endorse stats's (sometimes annoying) convention that a formula without an explicit intercept term actually includes an intercept. spaMM::update.HLfit was then defined to avoid this problem. Formula updates should still be carefully checked, as getting them perfect has not been on the priority list.
Various post-fit functions from base R may use update.formula directly, rather than using automatic method selection for update. update.formula is not itself a generic, which leads to the following problem. To make update.formula() work on multivariate-response fits, one would like to be able to redefine it as a generic, with an HLfit method that would perform what update_formulas does, but such a redefinition appears to be forbidden in a package distributed on CRAN. Instead it is suggested to define a new generic spaMM::update, which could have a spaMM::update.formula as a method (possibly itself a generic). This would be of limited interest as the new spaMM::update.formula would be visible to spaMM::update but not to stats::update, and thus the post-fit functions from base R would still not use this method.
Safe updating: update(<fit>, ...), as a general rule, is tricky. update methods are easily affected in a non-transparent way by changes in variables used in the original call. For example
foo <- rep(1,10)
m <- lm(rnorm(10)~1, weights=foo)
rm(foo)
update(m, .~.) # Error
To avoid such problems, spaMM tries to avoid references to variables in the global environment, by enforcing that the data are explicitly provided to the fitting functions by the data argument, and that any variable used in the prior.weights argument is in the data.
Bugs can also result when calling update on a fit produced within some function, say function somefn calling fitme(data=mydata,...), as e.g. update(<fit>) will then seek a global variable mydata that may differ from the fitted mydata which was local to somefn.