The "gls"
engine estimates linear regression for models where the rows of the
data are not independent.
Note that nlme::lme()
and nlme::gls()
can fit the same model but will count degrees of freedom differently. If
there are n
data points, p
fixed effects parameters, and q
random
effect parameters, the residual degrees of freedom are:
lme: n - p - q
gls: n - p
As a result, p-values will be different. For example, we can fit the same model using different estimation methods (assuming a positive covariance value):
gls_fit <-
linear_reg() %>%
set_engine("gls", correlation = corCompSymm(form = ~ 1 | subject)) %>%
fit(depr_score ~ week, data = riesby)lme_fit <-
linear_reg() %>%
set_engine("lme", random = ~ 1 | subject) %>%
fit(depr_score ~ week, data = riesby)
The estimated within-subject correlations are the same:
library(ape)# lme, use ape package:
lme_within_sub <- varcomp(lme_fit$fit)/sum(varcomp(lme_fit$fit))
lme_within_sub["subject"]
## subject
## 0.6820145
# gls:
summary(gls_fit$fit$modelStruct)
## Correlation Structure: Compound symmetry
## Formula: ~1 | subject
## Parameter estimate(s):
## Rho
## 0.6820145
as are the fixed effects (and their standard errors):
nlme::fixef(lme_fit$fit)
## (Intercept) week
## -4.953439 -2.119678
coef(gls_fit$fit)
## (Intercept) week
## -4.953439 -2.119678
However, the p-values for the fixed effects are different:
library(broom.mixed)# lme:
lme_fit %>% tidy() %>%
dplyr::filter(group == "fixed") %>%
dplyr::select(-group, -effect)
## # A tibble: 0 × 6
## # … with 6 variables: term <chr>, estimate <dbl>, std.error <dbl>, df <dbl>,
## # statistic <dbl>, p.value <dbl>
# gls:
gls_fit %>% tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -4.95 0.808 -6.13 3.50e- 9
## 2 week -2.12 0.224 -9.47 2.26e-18
The underlying model implementation does not allow for case weights.
J Pinheiro, and D Bates. 2000. Mixed-effects models in S and S-PLUS. Springer, New York, NY
For this engine, there is a single mode: regression
This model has no tuning parameters.
The multilevelmod extension package is required to fit this model.
library(multilevelmod)linear_reg() %>%
set_engine("gls") %>%
set_mode("regression") %>%
translate()
## Linear Regression Model Specification (regression)
##
## Computational engine: gls
##
## Model fit template:
## nlme::gls(formula = missing_arg(), data = missing_arg())
There are no specific preprocessing needs. However, it is helpful to keep the clustering/subject identifier column as factor or character (instead of making them into dummy variables). See the examples in the next section.
The model can accept case weights.
With parsnip, we suggest using the fixed effects formula method when
fitting, but the details of the correlation structure should be passed
to set_engine()
since it is an irregular (but required) argument:
library(tidymodels)
# load nlme to be able to use the `cor*()` functions
library(nlme)data("riesby")
linear_reg() %>%
set_engine("gls", correlation = corCompSymm(form = ~ 1 | subject)) %>%
fit(depr_score ~ week, data = riesby)
## parsnip model object
##
## Generalized least squares fit by REML
## Model: depr_score ~ week
## Data: data
## Log-restricted-likelihood: -765.0148
##
## Coefficients:
## (Intercept) week
## -4.953439 -2.119678
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | subject
## Parameter estimate(s):
## Rho
## 0.6820145
## Degrees of freedom: 250 total; 248 residual
## Residual standard error: 6.868785
When using tidymodels infrastructure, it may be better to use a
workflow. In this case, you can add the appropriate columns using
add_variables()
then supply the typical formula when adding the model:
library(tidymodels)gls_spec <-
linear_reg() %>%
set_engine("gls", correlation = corCompSymm(form = ~ 1 | subject))
gls_wflow <-
workflow() %>%
# The data are included as-is using:
add_variables(outcomes = depr_score, predictors = c(week, subject)) %>%
add_model(gls_spec, formula = depr_score ~ week)
fit(gls_wflow, data = riesby)