Fit a correlated random effect linear model using lmer with a Mundlak
within–between decomposition for selected covariates. For each
decomposed covariate \(Z_k\), the function constructs
\(Z_{k,\mathrm{bar},i} = \frac{1}{n_i}\sum_j Z_{k,ij}\) (the group mean, "between")
and \(Z_{k,\mathrm{within},ij} = Z_{k,ij} - Z_{k,\mathrm{bar},i}\) (the within-group deviation),
and estimates
$$Y_{ij} = \mu + \alpha_i + \sum_k \beta_{k,W} Z_{k,\mathrm{within},ij}
+ \sum_k \beta_{k,B} Z_{k,\mathrm{bar},i}
+ \mathbf{X}_{ij}^\top\gamma + \varepsilon_{ij},$$
where \(\alpha_i \sim \mathcal{N}(0,\sigma_\alpha^2)\) is a random intercept.
The function creates, for every name in wb.char, two columns:
<var>_bar (group mean within ProvID.char) and
<var>_within (observation minus its group mean).
The fitted model is:
Y ~ <all *_within> + <all *_bar> + <other.char> + (1 | ProvID)
In addition to these input formats, all arguments from the lmer function can be modified via ...,
allowing for customization of model fitting options such as controlling the optimization method or adjusting convergence criteria.
By default, the model is fitted using REML (restricted maximum likelihood).
If issues arise during model fitting, consider using the data_check function to perform a data quality check,
which can help identify missing values, low variation in covariates, high-pairwise correlation, and multicollinearity.
For datasets with missing values, this function automatically removes observations (rows) with any missing values before fitting the model.