Groups a data frame (similarly to dplyr::group_by()
) based on the values of
a column, either by dividing up the range into equal pieces or by quantiles.
Arguments
- .data
Data frame to bin
- col
Column to bin by
- breaks
Number of bins to create.
bin_by_interval()
also accepts a numeric vector of two or more unique cut points to use. IfNULL
, a default number of breaks is chosen based on the number of rows in the data. Inbin_by_quantile()
, if the number of unique values of the column is smaller thanbreaks
, fewer bins will be produced.
Value
Grouped data frame, similar to those returned by dplyr::group_by()
.
An additional column .bin
indicates the bin number for each group. Use
dplyr::summarize()
to calculate values within each group, or other dplyr
operations that work on groups.
Details
bin_by_interval()
breaks the numerical range of that column into
equal-sized intervals, or into intervals specified by breaks
.
bin_by_quantile()
splits the range into pieces based on quantiles of the
data, so each interval contains roughly an equal number of observations.
Examples
suppressMessages(library(dplyr))
cars |>
bin_by_interval(speed, breaks = 5) |>
summarize(mean_speed = mean(speed),
mean_dist = mean(dist))
#> # A tibble: 5 × 3
#> .bin mean_speed mean_dist
#> <int> <dbl> <dbl>
#> 1 1 6 10.8
#> 2 2 10.9 21.9
#> 3 3 14.2 39.5
#> 4 4 18.7 52.1
#> 5 5 23.7 82.9
cars |>
bin_by_quantile(speed, breaks = 5) |>
summarize(mean_speed = mean(speed),
mean_dist = mean(dist))
#> # A tibble: 5 × 3
#> .bin mean_speed mean_dist
#> <int> <dbl> <dbl>
#> 1 1 8.27 17
#> 2 2 13 35.7
#> 3 3 16 36.8
#> 4 4 19.1 55
#> 5 5 23.7 82.9