Simulate the distribution of estimates by parametric bootstrap
Source:R/diagnose.R
parametric_boot_distribution.Rd
Repeatedly simulates new response values by using the fitted model, holding the covariates fixed. By default, refits the same model to each simulated dataset, but an alternative model can be provided. Estimates, confidence intervals, or other quantities are extracted from each fitted model and returned as a tidy data frame.
Usage
parametric_boot_distribution(
fit,
alternative_fit = fit,
data = model.frame(fit),
fn = tidy,
nsim = 100,
...
)
Arguments
- fit
A model fit to data, such as by
lm()
orglm()
, to simulate new response values from.- alternative_fit
A model fit to data, to refit to the data sampled from
fit
. Defaults tofit
, but an alternative model can be provided to examine its behavior whenfit
is the true model.- data
Data frame to be used in the simulation. Must contain the predictors needed for both
fit
andalternative_fit
. Defaults to the predictors used infit
.- fn
Function to call on each new model fit to produce a data frame of estimates. Defaults to
broom::tidy()
, which produces a tidy data frame of coefficients, estimates, standard errors, and hypothesis tests.- nsim
Number of total simulations to run.
- ...
Additional arguments passed to
fn
each time it is called.
Value
A data frame (tibble) with columns corresponding to the columns
returned by fn
. The additional column .sample
indicates which fit each
row is from.
Details
The default behavior samples from a model and refits the same model to the
sampled data; this is useful when, for example, exploring how model
diagnostics look when the model is well-specified. Another common use of the
parametric bootstrap is hypothesis testing, where we might simulate from a
null model and fit an alternative model to the data, to obtain the null
distribution of a particular estimate or statistic. Provide alternative_fit
to have a specific model fit to each simulated dataset, rather than the model
they are simulated from.
Only the response variable from the fit
(or alternative_fit
, if given) is
redrawn; other response variables in the population are left unchanged from
their values in data
.
Model limitations
Because this function uses S3 generic methods such as model.frame()
,
simulate()
, and update()
, it can be used with any model fit for which
methods are provided. In base R, this includes lm()
and glm()
.
The model provided as fit
must be fit using the data
argument to provide
a data frame. For example:
When simulating new data, this function provides the simulated data as the
data
argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:
It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit cars
.
See also
model_lineup()
to use resampling to aid in regression diagnostics;
sampling_distribution()
to simulate draws from the population
distribution, rather than the null
Examples
# Bootstrap distribution of estimates:
fit <- lm(mpg ~ hp, data = mtcars)
parametric_boot_distribution(fit, nsim = 5)
#> # A tibble: 10 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 29.4 1.48 19.8 8.34e-19 1
#> 2 hp -0.0606 0.00918 -6.61 2.59e- 7 1
#> 3 (Intercept) 31.2 1.65 19.0 2.95e-18 2
#> 4 hp -0.0756 0.0102 -7.42 2.88e- 8 2
#> 5 (Intercept) 33.4 1.57 21.2 1.32e-19 3
#> 6 hp -0.0896 0.00975 -9.19 3.15e-10 3
#> 7 (Intercept) 29.1 1.54 18.9 3.24e-18 4
#> 8 hp -0.0538 0.00955 -5.64 3.86e- 6 4
#> 9 (Intercept) 27.6 1.56 17.7 1.99e-17 5
#> 10 hp -0.0511 0.00965 -5.30 9.99e- 6 5
# Bootstrap distribution of estimates for a quadratic model, when true
# relationship is linear:
quad_fit <- lm(mpg ~ poly(hp, 2), data = mtcars)
parametric_boot_distribution(fit, quad_fit, nsim = 5)
#> # A tibble: 15 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 20.8 0.592 35.2 2.41e-25 1
#> 2 poly(hp, 2)1 -21.6 3.35 -6.46 4.50e- 7 1
#> 3 poly(hp, 2)2 1.60 3.35 0.479 6.35e- 1 1
#> 4 (Intercept) 19.4 0.698 27.7 1.97e-22 2
#> 5 poly(hp, 2)1 -24.2 3.95 -6.13 1.13e- 6 2
#> 6 poly(hp, 2)2 -3.74 3.95 -0.947 3.51e- 1 2
#> 7 (Intercept) 20.5 0.638 32.2 3.00e-24 3
#> 8 poly(hp, 2)1 -26.6 3.61 -7.38 3.96e- 8 3
#> 9 poly(hp, 2)2 8.18 3.61 2.27 3.10e- 2 3
#> 10 (Intercept) 18.9 0.713 26.6 6.62e-22 4
#> 11 poly(hp, 2)1 -26.5 4.03 -6.57 3.37e- 7 4
#> 12 poly(hp, 2)2 3.29 4.03 0.816 4.21e- 1 4
#> 13 (Intercept) 19.6 0.628 31.3 6.82e-24 5
#> 14 poly(hp, 2)1 -28.8 3.55 -8.12 5.92e- 9 5
#> 15 poly(hp, 2)2 6.96 3.55 1.96 5.98e- 2 5
# Bootstrap distribution of estimates for a model with an additional
# predictor, when it's truly zero. data argument must be provided so
# alternative fit has all predictors available, not just hp:
alt_fit <- lm(mpg ~ hp + wt, data = mtcars)
parametric_boot_distribution(fit, alt_fit, data = mtcars, nsim = 5)
#> # A tibble: 15 × 6
#> term estimate std.error statistic p.value .n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Intercept) 30.7 2.16 14.2 1.26e-14 1
#> 2 hp -0.0592 0.0122 -4.85 3.80e- 5 1
#> 3 wt -0.653 0.854 -0.765 4.50e- 1 1
#> 4 (Intercept) 32.1 2.38 13.5 5.31e-14 2
#> 5 hp -0.0651 0.0135 -4.84 4.01e- 5 2
#> 6 wt -0.577 0.944 -0.612 5.45e- 1 2
#> 7 (Intercept) 33.8 2.23 15.2 2.52e-15 3
#> 8 hp -0.0759 0.0126 -6.03 1.47e- 6 3
#> 9 wt -0.577 0.882 -0.655 5.18e- 1 3
#> 10 (Intercept) 30.7 2.41 12.8 2.02e-13 4
#> 11 hp -0.0417 0.0136 -3.06 4.68e- 3 4
#> 12 wt -1.68 0.953 -1.76 8.91e- 2 4
#> 13 (Intercept) 29.9 2.37 12.6 2.61e-13 5
#> 14 hp -0.0633 0.0134 -4.73 5.35e- 5 5
#> 15 wt -0.129 0.938 -0.138 8.91e- 1 5