Response variables are related to predictors (and other response variables) through a link function and response distribution. First the expression provided is evaluated using the predictors, to give this response variable's value on the link scale; then the inverse link function and response distribution are used to get the response value. See Details for more information.
response(expr, family = gaussian(), error_scale = NULL, size = 1L)
A response_dist
object, to be used in population()
to specify a
population distribution
An expression, in terms of other predictor or response variables, giving this predictor's value on the link scale.
The family of this response variable, e.g. gaussian()
for an
ordinary Gaussian linear relationship.
Scale factor for errors. Used only for linear families,
such as gaussian()
and ols_with_error()
. Errors drawn while simulating
the response variable will be multiplied by this scale factor. The scale
factor can be a scalar value (such as a fixed standard deviation), or an
expression in terms of the predictors, which will be evaluated when
simulating response data. For generalized linear models, leave as NULL
.
When the family
is binomial()
, this is the number of trials
for each observation. Defaults to 1, as in logistic regression. May be
specified either as a vector of the same length as the number of
observations or as a scalar. May be written terms of other predictor or
response variables. For other families, size
is ignored.
Response variables are drawn based on a typical generalized linear model setup. Let \(Y\) represent the response variable and \(X\) represent the predictor variables. We specify that
$$Y \mid X \sim \text{SomeDistribution},$$
where
$$\mathbb{E}[Y \mid X = x] = g^{-1}(\mu(x)).$$
Here \(\mu(X)\) is the expression expr
, and both the distribution and
link function \(g\) are specified by the family
provided. For instance,
if the family
is gaussian()
, the distribution is Normal and the link is
the identity function; if the family
is binomial()
, the distribution is
binomial and the link is (by default) the logistic link.
The following response families are supported.
gaussian()
The default family is gaussian()
with the identity link function,
specifying the relationship
$$Y \mid X \sim \text{Normal}(\mu(X), \sigma^2),$$
where \(\sigma^2\) is given by error_scale
.
ols_with_error()
Allows specification of custom non-Normal error distributions, specifying the relationship
$$Y = \mu(X) + e,$$
where \(e\) is drawn from an arbitrary distribution, specified by the
error
argument to ols_with_error()
.
binomial()
Binomial responses include binary responses (as in logistic regression) and responses giving a total number of successes out of a number of trials. The response has distribution
$$Y \mid X \sim \text{Binomial}(N, g^{-1}(\mu(X))),$$
where \(N\) is set by the size
argument and \(g\) is the link function.
The default link is the logistic link, and others can be chosen with the
link
argument to binomial()
. The default \(N\) is 1, representing a
binary outcome.
poisson()
Poisson-distributed responses with distribution
$$Y \mid X \sim \text{Poisson}(g^{-1}(\mu(X))),$$
where \(g\) is the link function. The default link is the log link, and
others can be chosen with the link
argument to poisson()
.
custom_family()
Responses drawn from an arbitrary distribution with arbitrary link function, i.e.
$$Y \mid X \sim \text{SomeDistribution}(g^{-1}(\mu(X))),$$
where both \(g\) and SomeDistribution are specified by arguments to
custom_family()
.
The expr
, error_scale
, and size
arguments are evaluated only when
simulating data for this response variable. They are evaluated in an
environment with access to the predictor variables and the preceding response
variables, which they can refer to by name. Additionally, these arguments can
refer to variables in scope when the enclosing population()
was defined.
See the Examples below.
predictor()
and population()
to define populations;
ols_with_error()
and custom_family()
for custom response distributions
# Defining a binomial response. The expressions can refer to other predictors
# and to the environment where the `population()` is defined:
slope1 <- 2.5
slope2 <- -3
intercept <- -4.6
size <- 10
population(
x1 = predictor(rnorm),
x2 = predictor(rnorm),
y = response(intercept + slope1 * x1 + slope2 * x2,
family = binomial(), size = size)
)
Run the code above in your browser using DataLab