Commit Graph

1408 Commits (c46cf6fdcf1223c265958a929de049925fdffed9)

Author SHA1 Message Date
Andrew Lamb a6d2c13888
chore: Update arrow + other depenencies (#540)
* chore: Update arrow + other depenencies

* chore: Update write_buffer and query crate
2020-12-15 08:46:27 -05:00
Andrew Lamb 1740e26ec3
fix: do not produce gRPC series frames for fields that only contain null values (#558)
* test: add test for field columns with only nulls

* fix: do not produce series for null fields, tests for same

* fix: remove uneeded test printlns
2020-12-15 08:28:23 -05:00
Dom d34e09dab1
Merge pull request #561 from influxdata/dom/rustfmt-wrapping-unmangle
style: unmangle wrapped diagrams
2020-12-14 14:09:11 +00:00
Dom df82e8ced7
Merge branch 'main' into dom/rustfmt-wrapping-unmangle 2020-12-14 13:58:56 +00:00
Dom 4c35253fd5 style: unmangle wrapped diagrams
Adds #[rustfmt::skip] to comment blocks containing diagrams to skip wrapping.
2020-12-14 13:14:36 +00:00
Dom 193b68ee79
Merge pull request #543 from influxdata/dom/opentelemetry
feat(tracing): integrate Jaeger tracing sink
2020-12-14 13:02:16 +00:00
Dom 41f5099691 refactor: compile out trace! level for release builds
Configures the IOx tracing to compile out trace!() level events in the release
binary. This effectively gives contributors three levels of output:

* Important to the user (info & friends)
* Not important for regular running, but needed to debug
* Only useful to devs in a specific part of the system, never seen by user

Documents this behaviour (and general usage guidelines) for contributors.
2020-12-14 12:06:53 +00:00
Dom 667b2595d9 refactor: use expect for tracing unwrap 2020-12-14 12:06:53 +00:00
Dom 21110dc233 style: prefer is_ok()
Co-authored-by: Edd Robinson <me@edd.io>
2020-12-14 12:06:53 +00:00
Dom 2d29b985b4 chore(deps): remove env_logger from ingest
Already using tracing!
2020-12-14 12:06:53 +00:00
Dom 60ee7e1dbb chore(deps): remove unused env_logger 2020-12-14 12:06:53 +00:00
Dom 80da024212 docs(tracing): add IOx tracing usage doc
Describes the components involved in, and usage of the tracing system in IOx.
2020-12-14 12:06:53 +00:00
Dom 9d7389dec2 feat(tracing): add Jaeger tracing sink
Adds telemetry / tracing with support for a Jaeger backend, and changes the
logger from env_logger to a tracing subscriber to collect the log entries.

Events are batched and then emitted asynchronosuly via UDP to the Jaeger
collector using the tokio runtime. There's a bunch of settings (env
vars) related to batch sizes and flush frequency etc - they're all using
their default values at the moment (if it ain't broke...) See the docs
for more info:

    https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-environment-variables.md#opentelemetry-environment-variable-specification

This is only part 1 of telemetry - it does NOT propagate traces across RPC
boundaries as we're still defining how all this should work. I've created #541
to track this.

Closes #202 and closes #203.
2020-12-14 12:06:52 +00:00
Dom 7e351ba609
Merge pull request #559 from influxdata/dom/rustfmt-wrapping
style: wrap comments
2020-12-11 18:36:31 +00:00
Dom 6f473984d0 style: wrap comments
Runs rustfmt with the new config.
2020-12-11 18:22:26 +00:00
Dom 1446b5fcfc style: enforce comment wrapping
Adds a rustfmt config file so it automatically wraps comments to 80 chars (the
default.)

This is enforced as part of the CI pipeline.
2020-12-11 18:15:23 +00:00
Andrew Lamb d47acfa3b5
fix: better read_group input validation checking: group and hints fields (#539)
* fix: Error if hint argument is provided to read_groupg

* fix: Verify compatible group and group_keys settings

* docs: Add clarifying comments on validation

* refactor: use into() rather than String::from for consistency
2020-12-11 11:33:21 -05:00
Andrew Lamb ea6b2f6bc8
refactor: remove minor code duplication (#555) 2020-12-11 11:18:00 -05:00
Carol (Nichols || Goulding) 9dca302d3a
Merge pull request #545 from influxdata/cn+er/feat/segment-rle-group-final 2020-12-10 15:47:12 -05:00
Carol (Nichols || Goulding) fdf82be70b docs: Improving the column_name_and_column description 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) be8c266d3a refactor: Remove one whole lifetime from ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) b620c37ecb refactor: Remove one use of the input lifetime in ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383cd7cf38 refactor: Simplify lifetimes by returning col names from seg store 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383b601e10 fix: Some of the slice lifetimes aren't needed
Some are, though
2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 2abb9abfbc refactor: Elide some more lifetimes 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 4dbc77b441 fix: Change lifetimes on ReadGroupResults too 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) f98f45e49f fix: Correct and clarify lifetimes around the segment store 2020-12-10 15:22:45 -05:00
Dom 0032d03656
Merge pull request #550 from influxdata/dom/deprecate-mem-qe
chore: deprecate mem_qe
2020-12-10 19:00:02 +00:00
Dom c2156c6271
Merge branch 'main' into dom/deprecate-mem-qe 2020-12-10 18:47:55 +00:00
Dom 8ba15ae35f
Merge pull request #536 from brandonsov/brandonsov/add-bucket-location-to-object-store-errors
fix: Report bucket/location when relevant with object store errors
2020-12-10 18:23:27 +00:00
Dom c9a101ecae
Merge branch 'main' into brandonsov/add-bucket-location-to-object-store-errors 2020-12-10 18:14:27 +00:00
Dom d19a56eee0
Merge branch 'main' into dom/deprecate-mem-qe 2020-12-10 18:06:17 +00:00
Dom b9968c19bf
Merge pull request #549 from influxdata/dom/panic-handler-panic
fix: never uninstall panic handler
2020-12-10 18:05:18 +00:00
Dom 8ad0274cf1 chore: deprecate mem_qe 2020-12-10 18:02:20 +00:00
Brandon Sov 568065d63f style: rename location_string to location_copy 2020-12-10 09:41:24 -08:00
Brandon Sov 6247a01144 test: update typos 2020-12-10 09:24:42 -08:00
Dom 6513e2b056 fix: never uninstall panic handler
Fixes #548
2020-12-10 17:16:10 +00:00
Andrew Lamb 50ba529cb8
test: Adds tests for read_group for None aggregates (#538) 2020-12-10 11:31:15 -05:00
Edd Robinson 088a576eb0
Merge pull request #527 from influxdata/er/feat/segment-rle-group-final
feat: grouped aggregates for low-cardinality columns
2020-12-10 15:56:07 +00:00
Edd Robinson 7e04a6eaab refactor: address more PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 90b112c652 refactor: address PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 138031d5b1 test: add test case for multiple aggregates 2020-12-10 15:15:34 +00:00
Edd Robinson 1d1414e9cf refactor: use for loop 2020-12-10 15:15:34 +00:00
Edd Robinson 34faf72968 refactor: commented code 2020-12-10 15:15:34 +00:00
Edd Robinson 5e138bcded refactor: return groups as vectors 2020-12-10 15:15:34 +00:00
Edd Robinson cc80a73768 perf: replace map with vector 2020-12-10 15:15:34 +00:00
Edd Robinson e1b57aaec4 perf: copy as needed 2020-12-10 15:15:34 +00:00
Edd Robinson 99003b0a6a perf: check intersection cardinality before allocating
Becuase `bitset.and()` allocates a new bitset regardles of the resulting
cardinality we will be allocating more bitsets than necessary. This
change checks if we actually want to make the allocation.

It improves `read_group` performance by ~2X.

```
segment_read_group_pre_computed_groups_no_predicates_cardinality/2000
                        time:   [57.917 ms 58.286 ms 58.700 ms]
                        thrpt:  [34.072 Kelem/s 34.313 Kelem/s 34.532 Kelem/s]
                 change:
                        time:   [-59.703% -59.357% -59.057%] (p = 0.00 < 0.05)
                        thrpt:  [+144.24% +146.05% +148.16%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson fe27690ca8 test: add benchmarks for specific read_group path
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.

The benchmarks are in three groups:

* one group fixes the number of rows in the segment but varies the
  cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
  the number of columns needed to be grouped to produce the fixed
  cardinality.
* a final group fixes the number of columns being grouped, the
  cardinality, and instead varies the number of rows in the segment.

Some initial results from my development box are as follows:

```
                        time:   [51.099 ms 51.119 ms 51.140 ms]
                        thrpt:  [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_group_cols/1
                        time:   [93.162 us 93.219 us 93.280 us]
                        thrpt:  [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
                        time:   [571.72 us 572.31 us 572.98 us]
                        thrpt:  [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
                        time:   [1.7292 ms 1.7313 ms 1.7340 ms]
                        thrpt:  [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_rows/250000
                        time:   [562.29 us 565.19 us 568.80 us]
                        thrpt:  [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
                        time:   [561.32 us 561.85 us 562.47 us]
                        thrpt:  [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
                        time:   [573.75 us 574.27 us 574.85 us]
                        thrpt:  [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
                        time:   [586.36 us 586.74 us 587.19 us]
                        thrpt:  [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson 596e20ac92 feat: add from String implementation 2020-12-10 15:15:34 +00:00