brokenstick: Fit a `brokenstick` model to irregular data

Description

The brokenstick() function fits an irregularly observed series of measurements onto a user-specified grid of points. The model codes de grid by a series of linear B-splines. Differences between observations are expressed by one random effect per grid point. When multiple set of series are modelled, each modeled trajectory consists of straight lines that join at the chosen grid points, and hence look like a broken stick.

Usage

brokenstick(x, ...)
# S3 method for default
brokenstick(x, ...)
# S3 method for formula
brokenstick(
  formula,
  data,
  ...,
  knots = NULL,
  boundary = NULL,
  k = NULL,
  degree = 1L,
  method = c("lmer", "kr"),
  control = control_brokenstick(),
  seed = NA
)
# S3 method for data.frame
brokenstick(
  x,
  y,
  group,
  ...,
  knots = NULL,
  boundary = NULL,
  k = NULL,
  degree = 1L,
  method = c("lmer", "kr"),
  control = control_brokenstick(),
  seed = NA
)
# S3 method for matrix
brokenstick(
  x,
  y,
  group,
  ...,
  knots = NULL,
  boundary = NULL,
  k = NULL,
  degree = 1L,
  method = c("lmer", "kr"),
  control = control_brokenstick(),
  seed = NA
)
# S3 method for numeric
brokenstick(
  x,
  y,
  group,
  ...,
  knots = NULL,
  boundary = NULL,
  k = NULL,
  degree = 1L,
  method = c("lmer", "kr"),
  control = control_brokenstick(),
  seed = NA
)

Arguments

Predictor variables. Depending on the context:

A data frame of predictors.
A matrix of predictors.

If x has one column, then also specify y and group. If x has multiple columns, then specify the model by a formula argument.

...

Not currently used, but required for extensibility.

formula

A formula specifying the outcome terms on the left-hand side, the predictor term on the right-hand side and the group variable after the | sign, e.g formula = hgt ~ age | id. One may specify additional variables, but the brokenstick model will ignored them.

Note: This formula specification is specific to the brokenstick() function.

data

When a formula is used, data is specified as:

A data frame containing predictor, group and outcome variables.

knots

Optional, but recommended. Numerical vector with the locations of the breaks to be placed on the values of the predictor. Values outside the range of the data will extend the boundary knots (see below) beyond the data range.

boundary

Optional. Numerical vector of length 2 with the left and right boundary knots. The boundary setting is passed to splines::bs() as the Boundary.knots argument. If not specified, the range of predictor variable is taken. Automatic model specification is data-dependent. However, if both knots and boundary are supplied, the B-spline transformation parameter do not depend on the data. If specified, the boundary range is internally expanded to include at least range(knots). The warning some 'x' values beyond boundary knots may cause ill-conditioned bases implies that model fitting ignores any data beyond the (expanded) boundary range. It is possible to prevent this warning by pre-filtering rows in data to the boundary range.

Optional, a convenience parameter for the number of internal knots. If specified, then k internal knots are placed at equidense quantiles of the predictor. For example, specifying k = 1 puts a knot at the 50th quantile (median), setting k = 3 puts knots at the 25th, 50th and 75th quantiles, and so on.

Note: Knots specified via k are data-dependent and do not transfer to other data sets. We recommend using knots and boundary over k. If both k and knots are specified, then k takes precendence.

degree

the degree of the spline. The broken stick model requires linear splines, so the default is degree = 1. Setting degree = 0 yields (crisp) dummy coding, and one column less than for degree = 1. The brokenstick package supports only degree = 0 and degree = 1.

method

Estimation method. Either "kr" (for the Kasim-Raudenbush sampler) or "lmer" (for lme4::lmer()) (default).

control

A list with parameters. Use control_brokenstick() to generate.

seed

Seed number for base::set.seed(). Use NA to bypass seed setting.

Outcome variable. When x is a data frame or matrix, y is specified as:

A data frame with 1 numeric column.
A matrix with 1 numeric column.
A numeric vector.

group

Grouping variable. When x is a data frame or matrix, group is specified as:

A data frame with 1 column.
A matrix with 1 column.
A numeric vector.

Value

A brokenstick object.

Details

The variance-covariance matrix of the random effects absorbs the relations over time. By default, this matrix is estimated as unstructured by lme4::lmer(). This estimate may be unstable if the number of children is small relative to the number of specified knots. The function can be time consuming for data sets with thousands of children.

An alternative - often faster for models with many random effects - is to use the Bayesian Kasim-Raudenbush sampler (method kr). That method also allow for enforcing a simple structure on the variance-covariance matrix of the random effects. Currently, there are three such models: argyle, cole and none. See kr() and control_brokenstick() for more details.

Examples

Run this code

# NOT RUN {
train <- smocc_200[1:1198, ]

# fit with implicit boundary c(0, 3)
fit <- brokenstick(hgt.z ~ age | id, data = train, knots = 0:3)

# using KR sampler
fit <- brokenstick(hgt.z ~ age | id, data = train, knots = 0:3,
                   method = "kr", seed = 1)

# }
# NOT RUN {
knots <- round(c(0, 1, 2, 3, 6, 9, 12, 15, 18, 24) / 12, 4)
boundary <- c(0, 3)
fit_lmer <- brokenstick(hgt.z ~ age | id, data = train,
                        knots = knots, boundary = boundary)
fit_kr <- brokenstick(hgt.z ~ age | id, data = train, knots = knots,
                      boundary = boundary, method = "kr")
# }
# NOT RUN {
# Four ways to specify the same model
# Formula interface
mod1 <- brokenstick(hgt.z ~ age | id, train)

# XY interface - numeric vector
mod2 <- with(train, brokenstick(age, hgt.z, id))
identical(mod1, mod2)

# XY interface - data.frame
mod3 <- with(train, brokenstick(data.frame(age), hgt.z, id))
identical(mod1, mod3)

# XY interface - matrix
tt <- as.matrix(train[, c(1, 2, 7)])
mod4 <- brokenstick(tt[, "age", drop = FALSE],
                    tt[, "hgt.z", drop = FALSE],
                    tt[, "id", drop = FALSE])
identical(mod1, mod4)
# }

Run the code above in your browser using DataLab