After the projection of the reference model onto a submodel, the linear
predictors (for the original or a new dataset) based on that submodel can be
calculated by proj_linpred()
. The linear predictors can also be transformed
to response scale. Furthermore, proj_linpred()
returns the corresponding
log predictive density values if the (original or new) dataset contains
response values. The proj_predict()
function draws from the predictive
distributions (there is one such distribution for each observation from the
original or new dataset) of the submodel that the reference model has been
projected onto. If the projection has not been performed yet, both functions
call project()
internally to perform the projection. Both functions can
also handle multiple submodels at once (for object
s of class vsel
or
object
s returned by a project()
call to an object of class vsel
; see
project()
).
proj_linpred(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
transform = FALSE,
integrated = FALSE,
.seed = sample.int(.Machine$integer.max, 1),
...
)proj_predict(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
nresample_clusters = 1000,
.seed = sample.int(.Machine$integer.max, 1),
resp_oscale = TRUE,
...
)
In the following, \(S_{\mathrm{prj}}\), \(N\),
\(C_{\mathrm{cat}}\), and \(C_{\mathrm{lat}}\) from help
topic refmodel-init-get are used. (For proj_linpred()
with integrated = TRUE
, we have \(S_{\mathrm{prj}} = 1\).) Furthermore, let
\(C\) denote either \(C_{\mathrm{cat}}\) (if transform = TRUE
)
or \(C_{\mathrm{lat}}\) (if transform = FALSE
). Then, if the
prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(solution_terms)
in the call to project()
):
proj_linpred()
returns a list
with elements pred
(predictions,
i.e., the linear predictors, possibly transformed to response scale) and
lpd
(log predictive densities; only calculated if newdata
is NULL
or
if newdata
contains response values in the corresponding column). In case
of (i) the traditional projection, (ii) the latent projection with
transform = FALSE
, or (iii) the latent projection with transform = TRUE
and <refmodel>$family$cats
(where <refmodel>
is an object resulting
from init_refmodel()
; see also extend_family()
's argument
latent_y_unqs
) being NULL
, both elements are \(S_{\mathrm{prj}}
\times N\) matrices. In case of (i) the augmented-data projection
or (ii) the latent projection with transform = TRUE
and
<refmodel>$family$cats
being not NULL
, pred
is an
\(S_{\mathrm{prj}} \times N \times C\) array and lpd
is
an \(S_{\mathrm{prj}} \times N\) matrix.
proj_predict()
returns an \(S_{\mathrm{prj}} \times N\)
matrix of predictions where \(S_{\mathrm{prj}}\) denotes
nresample_clusters
in case of clustered projection. In case of (i) the
augmented-data projection or (ii) the latent projection with resp_oscale = TRUE
and <refmodel>$family$cats
being not NULL
, this matrix has an
attribute called cats
(the character vector of response categories) and
the values of the matrix are the predicted indices of the response
categories (these indices refer to the order of the response categories
from attribute cats
).
If the prediction is done for more than one submodel, the output from above
is returned for each submodel, giving a named list
with one element for
each submodel (the names of this list
being the numbers of solution terms
of the submodels when counting the intercept, too).
An object returned by project()
or an object that can be
passed to argument object
of project()
.
Passed to argument newdata
of the reference model's
extract_model_data
function (see init_refmodel()
). Provides the
predictor (and possibly also the response) data for the new (or old)
observations. May also be NULL
(see argument extract_model_data
of
init_refmodel()
). If not NULL
, any NA
s will trigger an error.
Passed to argument orhs
of the reference model's
extract_model_data
function (see init_refmodel()
). Used to get the
offsets for the new (or old) observations.
Passed to argument wrhs
of the reference model's
extract_model_data
function (see init_refmodel()
). Used to get the
weights for the new (or old) observations.
Only applies if object
is an object returned by
project()
. In that case, filter_nterms
can be used to filter object
for only those elements (submodels) with a number of solution terms in
filter_nterms
. Therefore, needs to be a numeric vector or NULL
. If
NULL
, use all submodels.
For proj_linpred()
only. A single logical value indicating
whether the linear predictor should be transformed to response scale using
the inverse-link function (TRUE
) or not (FALSE
). In case of the latent
projection, argument transform
is similar in spirit to argument
resp_oscale
from other functions and affects the scale of both output
elements pred
and lpd
(see sections "Details" and "Value" below).
For proj_linpred()
only. A single logical value
indicating whether the output should be averaged across the projected
posterior draws (TRUE
) or not (FALSE
).
Pseudorandom number generation (PRNG) seed by which the same
results can be obtained again if needed. Passed to argument seed
of
set.seed()
, but can also be NA
to not call set.seed()
at all. Here,
this seed is used for drawing new group-level effects in case of a
multilevel submodel (however, not yet in case of a GAMM) and for drawing
from the predictive distributions of the submodel(s) in case of
proj_predict()
. If a clustered projection was performed, then in
proj_predict()
, .seed
is also used for drawing from the set of
projected clusters of posterior draws (see argument nresample_clusters
).
Arguments passed to project()
if object
is not already an
object returned by project()
.
For proj_predict()
with clustered projection
only. Number of draws to return from the predictive distributions of the
submodel(s). Not to be confused with argument nclusters
of project()
:
nresample_clusters
gives the number of draws (with replacement) from
the set of clustered posterior draws after projection (with this set being
determined by argument nclusters
of project()
).
Only relevant for the latent projection. A single logical
value indicating whether to draw from the posterior-projection predictive
distributions on the original response scale (TRUE
) or on latent scale
(FALSE
).
In case of the latent projection and transform = FALSE
:
Output element pred
contains the linear predictors without any
modifications that may be due to the original response distribution (e.g.,
for a brms::cumulative()
model, the ordered thresholds are not taken into
account).
Output element lpd
contains the latent log predictive density values,
i.e., those corresponding to the latent Gaussian distribution. If newdata
is not NULL
, this requires the latent response values to be supplied in a
column called .<response_name>
of newdata
where <response_name>
needs
to be replaced by the name of the original response variable (if
<response_name>
contained parentheses, these have been stripped off by
init_refmodel()
; see the left-hand side of formula(<refmodel>)
). For
technical reasons, the existence of column <response_name>
in newdata
is another requirement (even though .<response_name>
is actually used).
if (requireNamespace("rstanarm", quietly = TRUE)) {
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The "stanreg" fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `nclusters`, but only for the sake of speed in this example;
# this is not recommended in general):
prj <- project(fit, solution_terms = c("X1", "X3", "X5"), nclusters = 10,
seed = 9182)
# Predictions (at the training points) from the submodel onto which the
# reference model was projected:
prjl <- proj_linpred(prj)
prjp <- proj_predict(prj, .seed = 7364)
}
Run the code above in your browser using DataLab