Response variables are related to predictors (and other response variables) through a link function and response distribution. First the expression provided is evaluated using the predictors, to give this response variable's value on the link scale; then the inverse link function and response distribution are used to get the response value. See Details for more information.
Usage
response(expr, family = gaussian(), error_scale = NULL, size = 1L)
Arguments
- expr
An expression, in terms of other predictor or response variables, giving this predictor's value on the link scale.
- family
The family of this response variable, e.g.
gaussian()
for an ordinary Gaussian linear relationship.- error_scale
Scale factor for errors. Used only for linear families, such as
gaussian()
andols_with_error()
. Errors drawn while simulating the response variable will be multiplied by this scale factor. The scale factor can be a scalar value (such as a fixed standard deviation), or an expression in terms of the predictors, which will be evaluated when simulating response data. For generalized linear models, leave asNULL
.- size
When the
family
isbinomial()
, this is the number of trials for each observation. Defaults to 1, as in logistic regression. May be specified either as a vector of the same length as the number of observations or as a scalar. May be written terms of other predictor or response variables. For other families,size
is ignored.
Value
A response_dist
object, to be used in population()
to specify a
population distribution
Details
Response variables are drawn based on a typical generalized linear model setup. Let \(Y\) represent the response variable and \(X\) represent the predictor variables. We specify that
$$Y \mid X \sim \text{SomeDistribution},$$
where
$$\mathbb{E}[Y \mid X = x] = g^{-1}(\mu(x)).$$
Here \(\mu(X)\) is the expression expr
, and both the distribution and
link function \(g\) are specified by the family
provided. For instance,
if the family
is gaussian()
, the distribution is Normal and the link is
the identity function; if the family
is binomial()
, the distribution is
binomial and the link is (by default) the logistic link.
Response families
The following response families are supported.
gaussian()
The default family is
gaussian()
with the identity link function, specifying the relationship$$Y \mid X \sim \text{Normal}(\mu(X), \sigma^2),$$
where \(\sigma^2\) is given by
error_scale
.ols_with_error()
Allows specification of custom non-Normal error distributions, specifying the relationship
$$Y = \mu(X) + e,$$
where \(e\) is drawn from an arbitrary distribution, specified by the
error
argument tools_with_error()
.binomial()
Binomial responses include binary responses (as in logistic regression) and responses giving a total number of successes out of a number of trials. The response has distribution
$$Y \mid X \sim \text{Binomial}(N, g^{-1}(\mu(X))),$$
where \(N\) is set by the
size
argument and \(g\) is the link function. The default link is the logistic link, and others can be chosen with thelink
argument tobinomial()
. The default \(N\) is 1, representing a binary outcome.poisson()
Poisson-distributed responses with distribution
$$Y \mid X \sim \text{Poisson}(g^{-1}(\mu(X))),$$
where \(g\) is the link function. The default link is the log link, and others can be chosen with the
link
argument topoisson()
.custom_family()
Responses drawn from an arbitrary distribution with arbitrary link function, i.e.
$$Y \mid X \sim \text{SomeDistribution}(g^{-1}(\mu(X))),$$
where both \(g\) and SomeDistribution are specified by arguments to
custom_family()
.
Evaluation and scoping
The expr
, error_scale
, and size
arguments are evaluated only when
simulating data for this response variable. They are evaluated in an
environment with access to the predictor variables and the preceding response
variables, which they can refer to by name. Additionally, these arguments can
refer to variables in scope when the enclosing population()
was defined.
See the Examples below.
See also
predictor()
and population()
to define populations;
ols_with_error()
and custom_family()
for custom response distributions
Examples
# Defining a binomial response. The expressions can refer to other predictors
# and to the environment where the `population()` is defined:
slope1 <- 2.5
slope2 <- -3
intercept <- -4.6
size <- 10
population(
x1 = predictor(rnorm),
x2 = predictor(rnorm),
y = response(intercept + slope1 * x1 + slope2 * x2,
family = binomial(), size = size)
)
#> Population with variables:
#> x1: rnorm()
#> x2: rnorm()
#> y: binomial(intercept + slope1 * x1 + slope2 * x2, size = size)