Simulate the distribution of estimates by parametric bootstrap

Repeatedly simulates new response values by using the fitted model, holding the covariates fixed. By default, refits the same model to each simulated dataset, but an alternative model can be provided. Estimates, confidence intervals, or other quantities are extracted from each fitted model and returned as a tidy data frame.

Usage

parametric_boot_distribution(
  fit,
  alternative_fit = fit,
  data = model.frame(fit),
  fn = tidy,
  nsim = 100,
  ...
)

Arguments

fit: A model fit to data, such as by lm() or glm(), to simulate new response values from.
alternative_fit: A model fit to data, to refit to the data sampled from fit. Defaults to fit, but an alternative model can be provided to examine its behavior when fit is the true model.
data: Data frame to be used in the simulation. Must contain the predictors needed for both fit and alternative_fit. Defaults to the predictors used in fit.
fn: Function to call on each new model fit to produce a data frame of estimates. Defaults to broom::tidy(), which produces a tidy data frame of coefficients, estimates, standard errors, and hypothesis tests.
nsim: Number of total simulations to run.
...: Additional arguments passed to fn each time it is called.

Value

A data frame (tibble) with columns corresponding to the columns returned by fn. The additional column .sample indicates which fit each row is from.

Details

The default behavior samples from a model and refits the same model to the sampled data; this is useful when, for example, exploring how model diagnostics look when the model is well-specified. Another common use of the parametric bootstrap is hypothesis testing, where we might simulate from a null model and fit an alternative model to the data, to obtain the null distribution of a particular estimate or statistic. Provide alternative_fit to have a specific model fit to each simulated dataset, rather than the model they are simulated from.

Only the response variable from the fit (or alternative_fit, if given) is redrawn; other response variables in the population are left unchanged from their values in data.

Model limitations

Because this function uses S3 generic methods such as model.frame(), simulate(), and update(), it can be used with any model fit for which methods are provided. In base R, this includes lm() and glm().

The model provided as fit must be fit using the data argument to provide a data frame. For example:

fit <- lm(dist ~ speed, data = cars)

When simulating new data, this function provides the simulated data as the data argument and re-fits the model. If you instead refer directly to local variables in the model formula, this will not work. For example, if you fit a model this way:

# will not work
fit <- lm(cars$dist ~ cars$speed)

It will not be possible to refit the model using simulated datasets, as that would require modifying your environment to edit cars.

Examples

# Bootstrap distribution of estimates:
fit <- lm(mpg ~ hp, data = mtcars)
parametric_boot_distribution(fit, nsim = 5)
#> # A tibble: 10 × 6
#>    term        estimate std.error statistic  p.value    .n
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl> <int>
#>  1 (Intercept)  29.4      1.48        19.8  8.34e-19     1
#>  2 hp           -0.0606   0.00918     -6.61 2.59e- 7     1
#>  3 (Intercept)  31.2      1.65        19.0  2.95e-18     2
#>  4 hp           -0.0756   0.0102      -7.42 2.88e- 8     2
#>  5 (Intercept)  33.4      1.57        21.2  1.32e-19     3
#>  6 hp           -0.0896   0.00975     -9.19 3.15e-10     3
#>  7 (Intercept)  29.1      1.54        18.9  3.24e-18     4
#>  8 hp           -0.0538   0.00955     -5.64 3.86e- 6     4
#>  9 (Intercept)  27.6      1.56        17.7  1.99e-17     5
#> 10 hp           -0.0511   0.00965     -5.30 9.99e- 6     5

# Bootstrap distribution of estimates for a quadratic model, when true
# relationship is linear:
quad_fit <- lm(mpg ~ poly(hp, 2), data = mtcars)
parametric_boot_distribution(fit, quad_fit, nsim = 5)
#> # A tibble: 15 × 6
#>    term         estimate std.error statistic  p.value    .n
#>    <chr>           <dbl>     <dbl>     <dbl>    <dbl> <int>
#>  1 (Intercept)     20.8      0.592    35.2   2.41e-25     1
#>  2 poly(hp, 2)1   -21.6      3.35     -6.46  4.50e- 7     1
#>  3 poly(hp, 2)2     1.60     3.35      0.479 6.35e- 1     1
#>  4 (Intercept)     19.4      0.698    27.7   1.97e-22     2
#>  5 poly(hp, 2)1   -24.2      3.95     -6.13  1.13e- 6     2
#>  6 poly(hp, 2)2    -3.74     3.95     -0.947 3.51e- 1     2
#>  7 (Intercept)     20.5      0.638    32.2   3.00e-24     3
#>  8 poly(hp, 2)1   -26.6      3.61     -7.38  3.96e- 8     3
#>  9 poly(hp, 2)2     8.18     3.61      2.27  3.10e- 2     3
#> 10 (Intercept)     18.9      0.713    26.6   6.62e-22     4
#> 11 poly(hp, 2)1   -26.5      4.03     -6.57  3.37e- 7     4
#> 12 poly(hp, 2)2     3.29     4.03      0.816 4.21e- 1     4
#> 13 (Intercept)     19.6      0.628    31.3   6.82e-24     5
#> 14 poly(hp, 2)1   -28.8      3.55     -8.12  5.92e- 9     5
#> 15 poly(hp, 2)2     6.96     3.55      1.96  5.98e- 2     5

# Bootstrap distribution of estimates for a model with an additional
# predictor, when it's truly zero. data argument must be provided so
# alternative fit has all predictors available, not just hp:
alt_fit <- lm(mpg ~ hp + wt, data = mtcars)
parametric_boot_distribution(fit, alt_fit, data = mtcars, nsim = 5)
#> # A tibble: 15 × 6
#>    term        estimate std.error statistic  p.value    .n
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl> <int>
#>  1 (Intercept)  30.7       2.16      14.2   1.26e-14     1
#>  2 hp           -0.0592    0.0122    -4.85  3.80e- 5     1
#>  3 wt           -0.653     0.854     -0.765 4.50e- 1     1
#>  4 (Intercept)  32.1       2.38      13.5   5.31e-14     2
#>  5 hp           -0.0651    0.0135    -4.84  4.01e- 5     2
#>  6 wt           -0.577     0.944     -0.612 5.45e- 1     2
#>  7 (Intercept)  33.8       2.23      15.2   2.52e-15     3
#>  8 hp           -0.0759    0.0126    -6.03  1.47e- 6     3
#>  9 wt           -0.577     0.882     -0.655 5.18e- 1     3
#> 10 (Intercept)  30.7       2.41      12.8   2.02e-13     4
#> 11 hp           -0.0417    0.0136    -3.06  4.68e- 3     4
#> 12 wt           -1.68      0.953     -1.76  8.91e- 2     4
#> 13 (Intercept)  29.9       2.37      12.6   2.61e-13     5
#> 14 hp           -0.0633    0.0134    -4.73  5.35e- 5     5
#> 15 wt           -0.129     0.938     -0.138 8.91e- 1     5