Group a data frame into bins — bin_by_interval • regressinator

Groups a data frame (similarly to dplyr::group_by()) based on the values of a column, either by dividing up the range into equal pieces or by quantiles.

Usage

bin_by_interval(.data, col, breaks = NULL)

bin_by_quantile(.data, col, breaks = NULL)

Arguments

.data: Data frame to bin
col: Column to bin by
breaks: Number of bins to create. bin_by_interval() also accepts a numeric vector of two or more unique cut points to use. If NULL, a default number of breaks is chosen based on the number of rows in the data. In bin_by_quantile(), if the number of unique values of the column is smaller than breaks, fewer bins will be produced.

Value

Grouped data frame, similar to those returned by dplyr::group_by(). An additional column .bin indicates the bin number for each group. Use dplyr::summarize() to calculate values within each group, or other dplyr operations that work on groups.

Details

bin_by_interval() breaks the numerical range of that column into equal-sized intervals, or into intervals specified by breaks. bin_by_quantile() splits the range into pieces based on quantiles of the data, so each interval contains roughly an equal number of observations.

Examples

suppressMessages(library(dplyr))
cars |>
  bin_by_interval(speed, breaks = 5) |>
  summarize(mean_speed = mean(speed),
            mean_dist = mean(dist))
#> # A tibble: 5 × 3
#>    .bin mean_speed mean_dist
#>   <int>      <dbl>     <dbl>
#> 1     1        6        10.8
#> 2     2       10.9      21.9
#> 3     3       14.2      39.5
#> 4     4       18.7      52.1
#> 5     5       23.7      82.9

cars |>
  bin_by_quantile(speed, breaks = 5) |>
  summarize(mean_speed = mean(speed),
            mean_dist = mean(dist))
#> # A tibble: 5 × 3
#>    .bin mean_speed mean_dist
#>   <int>      <dbl>     <dbl>
#> 1     1       8.27      17  
#> 2     2      13         35.7
#> 3     3      16         36.8
#> 4     4      19.1       55  
#> 5     5      23.7       82.9