* fix: Error if hint argument is provided to read_groupg
* fix: Verify compatible group and group_keys settings
* docs: Add clarifying comments on validation
* refactor: use into() rather than String::from for consistency
Becuase `bitset.and()` allocates a new bitset regardles of the resulting
cardinality we will be allocating more bitsets than necessary. This
change checks if we actually want to make the allocation.
It improves `read_group` performance by ~2X.
```
segment_read_group_pre_computed_groups_no_predicates_cardinality/2000
time: [57.917 ms 58.286 ms 58.700 ms]
thrpt: [34.072 Kelem/s 34.313 Kelem/s 34.532 Kelem/s]
change:
time: [-59.703% -59.357% -59.057%] (p = 0.00 < 0.05)
thrpt: [+144.24% +146.05% +148.16%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
```
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.
The benchmarks are in three groups:
* one group fixes the number of rows in the segment but varies the
cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
the number of columns needed to be grouped to produce the fixed
cardinality.
* a final group fixes the number of columns being grouped, the
cardinality, and instead varies the number of rows in the segment.
Some initial results from my development box are as follows:
```
time: [51.099 ms 51.119 ms 51.140 ms]
thrpt: [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/1
time: [93.162 us 93.219 us 93.280 us]
thrpt: [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
time: [571.72 us 572.31 us 572.98 us]
thrpt: [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
time: [1.7292 ms 1.7313 ms 1.7340 ms]
thrpt: [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/250000
time: [562.29 us 565.19 us 568.80 us]
thrpt: [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
6 (6.00%) high mild
12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
time: [561.32 us 561.85 us 562.47 us]
thrpt: [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
time: [573.75 us 574.27 us 574.85 us]
thrpt: [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
time: [586.36 us 586.74 us 587.19 us]
thrpt: [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
```