Learn R Programming

EpiForsk (version 0.2.0)

lms: Wrapper around lm for sibling design

Description

Fits a linear model using demeaned data. Useful for sibling design.

Usage

lms(formula, data, grp_id, obs_id = NULL, ...)

# S3 method for lms print(x, digits = max(3L, getOption("digits") - 3L), ...)

Value

A list with class c("lms", "lm"). Contains the output from lm applied to demeaned data according to formula, as well as the original data and the provided formula.

Arguments

formula

A formula, used to create a model matrix with demeaned columns.

data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

grp_id

<data-masking> One unquoted expression naming the id variable in data defining the groups to demean, e.g. sibling groups.

obs_id

<data-masking> Optional, One unquoted expression naming an id variable to keep track of the input data order.

...

Additional arguments to be passed to lm. In print, additional arguments are ignored without warning.

x

An S3 object with class lms.

digits

The number of significant digits to be passed to format(coef(x), .) when print()ing.

Author

KIJA

Details

lms estimates parameters in the linear model $$y_{ij_i}=\alpha_i+x_{ij_i}^T\beta + \varepsilon_{ij_i}$$ where \(\alpha_i\) is a group (e.g. sibling group) specific intercept and \(x_{ij_i}\) are covariate values for observation \(j_i\) in group i. \(\varepsilon_{ij_i}\sim N(0, \sigma^2)\) is a normally distributed error term. It is assumed that interest is in estimating the vector \(\beta\) while \(\alpha_{i}\) are nuisance parameters. Estimation of \(\beta\) uses the mean deviation method, where $$y_{ij_i}^{'}=y_{ij_i}-y_i$$ is regressed on $$x_{ij_i}^{'}=x_{ij_i}-x_i.$$ Here \(y_i\) and \(x_i\) refers to the mean of y and x in group i.
lms can keep track of observations by providing a unique identifier column to obs_id. lms will return obs_id so it matches the order of observations in model.
lms only supports syntactic covariate names. Using a non-syntactic name risks returning an error, e.g if names end in + or -.

Examples

Run this code
# \donttest{
sib_id <- sample(200, 1000, replace = TRUE)
sib_out <- rnorm(200)
x1 <- rnorm(1000)
x2 <- rnorm(1000) + sib_out[sib_id] + x1
y <- rnorm(1000, 1, 0.5) + 2 * sib_out[sib_id] - x1 + 2 * x2
data <- data.frame(
  x1 = x1,
  x2 = x2,
  y = y,
  sib_id = sib_id,
  obs_id = 1:1000
)
mod_lm <- lm(y ~ x1 + x2, data) # OLS model
mod_lm_grp <- lm(y ~ x1 + x2 + factor(sib_id), data) # OLS with grp
mod_lms <- lms(y ~ x1 + x2, data, sib_id, obs_id) # conditional model
summary(mod_lm)
coef(mod_lm_grp)[1:3]
summary(mod_lms)
print(mod_lms)
# }

Run the code above in your browser using DataLab