Commit Graph

1343 Commits (1446b5fcfc2e13ac3ea9d48622e76476d466e513)

Author SHA1 Message Date
Dom 1446b5fcfc style: enforce comment wrapping
Adds a rustfmt config file so it automatically wraps comments to 80 chars (the
default.)

This is enforced as part of the CI pipeline.
2020-12-11 18:15:23 +00:00
Andrew Lamb d47acfa3b5
fix: better read_group input validation checking: group and hints fields (#539)
* fix: Error if hint argument is provided to read_groupg

* fix: Verify compatible group and group_keys settings

* docs: Add clarifying comments on validation

* refactor: use into() rather than String::from for consistency
2020-12-11 11:33:21 -05:00
Andrew Lamb ea6b2f6bc8
refactor: remove minor code duplication (#555) 2020-12-11 11:18:00 -05:00
Carol (Nichols || Goulding) 9dca302d3a
Merge pull request #545 from influxdata/cn+er/feat/segment-rle-group-final 2020-12-10 15:47:12 -05:00
Carol (Nichols || Goulding) fdf82be70b docs: Improving the column_name_and_column description 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) be8c266d3a refactor: Remove one whole lifetime from ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) b620c37ecb refactor: Remove one use of the input lifetime in ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383cd7cf38 refactor: Simplify lifetimes by returning col names from seg store 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383b601e10 fix: Some of the slice lifetimes aren't needed
Some are, though
2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 2abb9abfbc refactor: Elide some more lifetimes 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 4dbc77b441 fix: Change lifetimes on ReadGroupResults too 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) f98f45e49f fix: Correct and clarify lifetimes around the segment store 2020-12-10 15:22:45 -05:00
Dom 0032d03656
Merge pull request #550 from influxdata/dom/deprecate-mem-qe
chore: deprecate mem_qe
2020-12-10 19:00:02 +00:00
Dom c2156c6271
Merge branch 'main' into dom/deprecate-mem-qe 2020-12-10 18:47:55 +00:00
Dom 8ba15ae35f
Merge pull request #536 from brandonsov/brandonsov/add-bucket-location-to-object-store-errors
fix: Report bucket/location when relevant with object store errors
2020-12-10 18:23:27 +00:00
Dom c9a101ecae
Merge branch 'main' into brandonsov/add-bucket-location-to-object-store-errors 2020-12-10 18:14:27 +00:00
Dom d19a56eee0
Merge branch 'main' into dom/deprecate-mem-qe 2020-12-10 18:06:17 +00:00
Dom b9968c19bf
Merge pull request #549 from influxdata/dom/panic-handler-panic
fix: never uninstall panic handler
2020-12-10 18:05:18 +00:00
Dom 8ad0274cf1 chore: deprecate mem_qe 2020-12-10 18:02:20 +00:00
Brandon Sov 568065d63f style: rename location_string to location_copy 2020-12-10 09:41:24 -08:00
Brandon Sov 6247a01144 test: update typos 2020-12-10 09:24:42 -08:00
Dom 6513e2b056 fix: never uninstall panic handler
Fixes #548
2020-12-10 17:16:10 +00:00
Andrew Lamb 50ba529cb8
test: Adds tests for read_group for None aggregates (#538) 2020-12-10 11:31:15 -05:00
Edd Robinson 088a576eb0
Merge pull request #527 from influxdata/er/feat/segment-rle-group-final
feat: grouped aggregates for low-cardinality columns
2020-12-10 15:56:07 +00:00
Edd Robinson 7e04a6eaab refactor: address more PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 90b112c652 refactor: address PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 138031d5b1 test: add test case for multiple aggregates 2020-12-10 15:15:34 +00:00
Edd Robinson 1d1414e9cf refactor: use for loop 2020-12-10 15:15:34 +00:00
Edd Robinson 34faf72968 refactor: commented code 2020-12-10 15:15:34 +00:00
Edd Robinson 5e138bcded refactor: return groups as vectors 2020-12-10 15:15:34 +00:00
Edd Robinson cc80a73768 perf: replace map with vector 2020-12-10 15:15:34 +00:00
Edd Robinson e1b57aaec4 perf: copy as needed 2020-12-10 15:15:34 +00:00
Edd Robinson 99003b0a6a perf: check intersection cardinality before allocating
Becuase `bitset.and()` allocates a new bitset regardles of the resulting
cardinality we will be allocating more bitsets than necessary. This
change checks if we actually want to make the allocation.

It improves `read_group` performance by ~2X.

```
segment_read_group_pre_computed_groups_no_predicates_cardinality/2000
                        time:   [57.917 ms 58.286 ms 58.700 ms]
                        thrpt:  [34.072 Kelem/s 34.313 Kelem/s 34.532 Kelem/s]
                 change:
                        time:   [-59.703% -59.357% -59.057%] (p = 0.00 < 0.05)
                        thrpt:  [+144.24% +146.05% +148.16%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson fe27690ca8 test: add benchmarks for specific read_group path
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.

The benchmarks are in three groups:

* one group fixes the number of rows in the segment but varies the
  cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
  the number of columns needed to be grouped to produce the fixed
  cardinality.
* a final group fixes the number of columns being grouped, the
  cardinality, and instead varies the number of rows in the segment.

Some initial results from my development box are as follows:

```
                        time:   [51.099 ms 51.119 ms 51.140 ms]
                        thrpt:  [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_group_cols/1
                        time:   [93.162 us 93.219 us 93.280 us]
                        thrpt:  [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
                        time:   [571.72 us 572.31 us 572.98 us]
                        thrpt:  [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
                        time:   [1.7292 ms 1.7313 ms 1.7340 ms]
                        thrpt:  [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_rows/250000
                        time:   [562.29 us 565.19 us 568.80 us]
                        thrpt:  [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
                        time:   [561.32 us 561.85 us 562.47 us]
                        thrpt:  [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
                        time:   [573.75 us 574.27 us 574.85 us]
                        thrpt:  [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
                        time:   [586.36 us 586.74 us 587.19 us]
                        thrpt:  [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson 596e20ac92 feat: add from String implementation 2020-12-10 15:15:34 +00:00
Edd Robinson e400fb71bb feat: add from conversion for String 2020-12-10 15:15:34 +00:00
Edd Robinson 10552eb51b refactor: create collection of ReadGroupResult type 2020-12-10 15:15:34 +00:00
Edd Robinson 8c45170a15 feat: read group aggregates on RLE columns 2020-12-10 15:15:34 +00:00
Edd Robinson 8fd211798a refactor: aggregate sum can return a Scalar 2020-12-10 15:15:34 +00:00
Edd Robinson 6d2b69d4a3 feat: add column properties
Column properties can be used to determine what abilities a column has
at runtime, which will vary depending on the encoding used.
2020-12-10 15:15:34 +00:00
Edd Robinson e4b8fb3387 refactor: use Cow for group row ids 2020-12-10 15:15:34 +00:00
Edd Robinson f7f87164b4 refactor: initial read_group skeleton 2020-12-10 15:15:34 +00:00
Edd Robinson c199d59c04 refactor: improve aggregate support 2020-12-10 15:15:34 +00:00
Edd Robinson c259a461c1 feat: extend dictionary column API
Add methods for getting distinct row ids for values and for getting
logical values.
2020-12-10 15:15:34 +00:00
Dom 756e7de867
Merge pull request #542 from ming535/ming
chore: some minor comments and rename
2020-12-10 10:18:18 +00:00
huming a5a3cd149d chore: some minor comments and rename 2020-12-10 10:48:57 +08:00
Brandon Sov 146bf59d8d test: simplify test error matching 2020-12-09 11:36:49 -08:00
Brandon Sov d179fe68d3 refactor: replace bucket_name clones with references 2020-12-09 11:03:19 -08:00
Brandon Sov af8569378f test: move common variable and function to general test usage 2020-12-09 11:01:51 -08:00
Brandon Sov 625542c310 fix: Update s3 error function to correct pattern 2020-12-09 10:14:50 -08:00