This function enables the use of Mixed Effects Random Forests (MERFs) by effectively
combining a random forest from ranger with a model capturing random effects from
lme4. The MERF algorithm is an algorithmic procedure reminiscent of an EM-algorithm
(see Details). The function is the base-function for the wrapping function (SAEforest_model
and should not be directly used by the ordinary user. Recommended exceptions are applications exceeding
the scope of existing wrapper functions or further research. The function MERFranger
allows to model complex patterns of structural relations (see Examples). The function returns
an object of class MERFranger, which can be used to produce unit-level predictions. In contrast to
the wrapping functions, this function does not directly provide SAE estimates on domain-specific indicators.
MERFranger(
Y,
X,
random,
data,
importance = "none",
initialRandomEffects = 0,
ErrorTolerance = 1e-04,
MaxIterations = 25,
na.rm = TRUE,
...
)An object of class MERFranger includes the following elements:
ForestA random forest of class ranger modelling fixed effects of the model.
EffectModelA model of random effects of class merMod capturing
structural components of MERFs and modeling random components.
RandomEffectsList element containing the values of random intercepts from EffectModel.
RanEffSDNumeric value of the standard deviation of random intercepts.
ErrorSDNumeric value of standard deviation of unit-level errors.
VarianceCovarianceVarCorr matrix from EffectModel.
LogLikVector with numerical entries showing the loglikelihood of the MERF algorithm.
IterationsUsedNumeric number of iterations used until convergence of the MERF algorithm.
OOBresidualsVector of OOB-residuals.
RandomCharacter specifying the random intercept in the random effects model.
ErrorToleranceNumerical value to monitor the MERF algorithm's convergence.
initialRandomEffectsNumeric value or vector of initial specification of random effects.
MaxIterationsNumeric value specifying the maximal amount of iterations for the MERF algorithm.
Continuous input value of target variable.
Matrix of predictive covariates.
Specification of random effects terms following the syntax of lmer.
Random effect terms are specified by vertical bars (|) separating expressions for design matrices
from grouping factors. For further details see lmer and the example below.
data.frame of sample data including the specified elements of Y and
X.
Variable importance mode processed by the random forest from the ranger. Must be 'none', 'impurity', 'impurity_corrected', 'permutation'. For further details see ranger.
Numeric value or vector of initial estimate of random effects. Defaults to 0.
Numeric value to monitor the MERF algorithm's convergence. Defaults to 1e-04.
Numeric value specifying the maximal amount of iterations for the MERF algorithm. Defaults to 25.
Logical. Whether missing values should be removed. Defaults to TRUE.
Additional parameters are directly passed to the random forest ranger.
Most important parameters are for instance mtry (number of variables to possibly split at
in each node), or num.trees (number of trees). For further details on possible parameters
see ranger and the example below.
There exists a generic function for predict for objects obtained by MERFranger.
The MERF algorithm iteratively optimizes two separate steps: a) the random forest function, assuming the random effects term to be correct and b) estimates the random effects part, assuming the OOB-predictions from the forest to be correct. Overall convergence of the algorithm is monitored by the log-likelihood of a joint model of both components. For further details see Krennmair & Schmid (2022) or Hajjem et al. (2014).
Note that the MERFranger object is a composition of elements from a random forest of class
ranger and a random effects model of class merMod. Thus, all generic functions are
applicable to corresponding objects. For further details on generic functions see ranger
and lmer as well as the examples below.
Hajjem, A., Bellavance, F., & Larocque, D. (2014). Mixed-Effects Random Forest for Clustered Data. Journal of Statistical Computation and Simulation, 84 (6), 1313–1328.
Krennmair, P., & Schmid, T. (2022). Flexible Domain Prediction Using Mixed Effects Random Forests. Journal of Royal Statistical Society: Series C (Applied Statistics) (forthcoming).
SAEforest, ranger, lmer,
SAEforest_model
# \donttest{
# Load Data
data("eusilcA_pop")
data("eusilcA_smp")
income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]
# Example 1:
# Calculating general model used in wrapper functions
model1 <- MERFranger(Y = income, X = X_covar, random = "(1|district)",
data = eusilcA_smp, num.trees=50)
# get individual predictions:
ind_pred <- predict(model1, eusilcA_pop)
# }
Run the code above in your browser using DataLab