Commit Graph

1462 Commits (1d972e01c8929c7f5d408b791b92778f09b78129)

Author SHA1 Message Date
Edd Robinson e1b57aaec4 perf: copy as needed 2020-12-10 15:15:34 +00:00
Edd Robinson 99003b0a6a perf: check intersection cardinality before allocating
Becuase `bitset.and()` allocates a new bitset regardles of the resulting
cardinality we will be allocating more bitsets than necessary. This
change checks if we actually want to make the allocation.

It improves `read_group` performance by ~2X.

```
segment_read_group_pre_computed_groups_no_predicates_cardinality/2000
                        time:   [57.917 ms 58.286 ms 58.700 ms]
                        thrpt:  [34.072 Kelem/s 34.313 Kelem/s 34.532 Kelem/s]
                 change:
                        time:   [-59.703% -59.357% -59.057%] (p = 0.00 < 0.05)
                        thrpt:  [+144.24% +146.05% +148.16%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson fe27690ca8 test: add benchmarks for specific read_group path
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.

The benchmarks are in three groups:

* one group fixes the number of rows in the segment but varies the
  cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
  the number of columns needed to be grouped to produce the fixed
  cardinality.
* a final group fixes the number of columns being grouped, the
  cardinality, and instead varies the number of rows in the segment.

Some initial results from my development box are as follows:

```
                        time:   [51.099 ms 51.119 ms 51.140 ms]
                        thrpt:  [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_group_cols/1
                        time:   [93.162 us 93.219 us 93.280 us]
                        thrpt:  [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
                        time:   [571.72 us 572.31 us 572.98 us]
                        thrpt:  [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
                        time:   [1.7292 ms 1.7313 ms 1.7340 ms]
                        thrpt:  [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_rows/250000
                        time:   [562.29 us 565.19 us 568.80 us]
                        thrpt:  [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
                        time:   [561.32 us 561.85 us 562.47 us]
                        thrpt:  [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
                        time:   [573.75 us 574.27 us 574.85 us]
                        thrpt:  [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
                        time:   [586.36 us 586.74 us 587.19 us]
                        thrpt:  [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson 596e20ac92 feat: add from String implementation 2020-12-10 15:15:34 +00:00
Edd Robinson e400fb71bb feat: add from conversion for String 2020-12-10 15:15:34 +00:00
Edd Robinson 10552eb51b refactor: create collection of ReadGroupResult type 2020-12-10 15:15:34 +00:00
Edd Robinson 8c45170a15 feat: read group aggregates on RLE columns 2020-12-10 15:15:34 +00:00
Edd Robinson 8fd211798a refactor: aggregate sum can return a Scalar 2020-12-10 15:15:34 +00:00
Edd Robinson 6d2b69d4a3 feat: add column properties
Column properties can be used to determine what abilities a column has
at runtime, which will vary depending on the encoding used.
2020-12-10 15:15:34 +00:00
Edd Robinson e4b8fb3387 refactor: use Cow for group row ids 2020-12-10 15:15:34 +00:00
Edd Robinson f7f87164b4 refactor: initial read_group skeleton 2020-12-10 15:15:34 +00:00
Edd Robinson c199d59c04 refactor: improve aggregate support 2020-12-10 15:15:34 +00:00
Edd Robinson c259a461c1 feat: extend dictionary column API
Add methods for getting distinct row ids for values and for getting
logical values.
2020-12-10 15:15:34 +00:00
Dom 756e7de867
Merge pull request #542 from ming535/ming
chore: some minor comments and rename
2020-12-10 10:18:18 +00:00
huming a5a3cd149d chore: some minor comments and rename 2020-12-10 10:48:57 +08:00
Brandon Sov 146bf59d8d test: simplify test error matching 2020-12-09 11:36:49 -08:00
Brandon Sov d179fe68d3 refactor: replace bucket_name clones with references 2020-12-09 11:03:19 -08:00
Brandon Sov af8569378f test: move common variable and function to general test usage 2020-12-09 11:01:51 -08:00
Brandon Sov 625542c310 fix: Update s3 error function to correct pattern 2020-12-09 10:14:50 -08:00
Brandon Sov 4be47b1ccc fix: Move functions to the conditional compilation flag to pass linter 2020-12-08 23:42:41 -08:00
Brandon Sov 62c14de2bc fix: Update pattern match to detect String 2020-12-08 23:42:33 -08:00
Brandon Sov 989d0ecad8 refactor: set valid format for default s3 bucket name example 2020-12-08 23:42:27 -08:00
Brandon Sov 1a4b2eac26 fix: Report bucket/location when relevant with object store errors 2020-12-08 22:29:28 -08:00
Paul Dix fa3ecbd4ed
feat: Implement write buffer to Parquet snapshotting (#526)
* feat: Implement write buffer to Parquet snapshotting

This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
2020-12-08 14:20:43 -05:00
Edd Robinson 91bc7fbdd1
Merge pull request #525 from influxdata/er/chore/bench-debug
chore: add debug symbols to benchmarks
2020-12-04 20:05:14 +00:00
Edd Robinson f3af86ccb4 chore: add debug symbols to benchmarks 2020-12-04 16:36:05 +00:00
Dom 4346ad62cb
Merge pull request #521 from influxdata/dom/org-bucket-types
fix: unambigious bucket/org to DB mappings
2020-12-04 11:44:41 +00:00
Dom ceea61a211
Merge branch 'main' into dom/org-bucket-types 2020-12-04 11:33:36 +00:00
Andrew Lamb 4ec75a4f22
fix: Fix gRPC panic` when multiple field selections are provided (#523)
* fix: do not assert when multiple fields are selected

* fix: clippy

* fix: write unit test, fix bug

* fix: tweak comments
2020-12-03 12:31:02 -05:00
Dom ffbeb4dbcc docs: fix RangeInclusive 2020-12-03 16:10:16 +00:00
Dom 87573256a7 chore: fmt 2020-12-03 16:10:16 +00:00
Dom d96ed66c32 refactor: clearer lifetime for org&bucket mapping
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-03 16:10:16 +00:00
Dom 13f391e2b9 refactor: ignore destructured fields
I temporarily forgot I can do this.

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-03 16:10:16 +00:00
Dom 234df612ec refactor: avoid clones for errors
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-03 16:10:16 +00:00
Dom 59f9665438 test: cover org_and_bucket_to_database 2020-12-03 16:10:16 +00:00
Dom aa1c95401e refactor: DB names 1..=64
Co-authored-by: Edd Robinson <me@edd.io>
2020-12-03 16:10:15 +00:00
Dom b03de0e7ef refactor: remove needless lifetimes 2020-12-03 16:10:15 +00:00
Dom f90a95fd80 fix: unambigious bucket/org to DB mappings
Previosuly the $ORG and $BUCKET was joined as:

	$ORG + "_" + $BUCKET

Which is fine unless either $ORG or $BUCKET includes a "_", such as:

	$ORG = "org_a"
	$BUCKET = "bucket"

	and

	$ORG = "org"
	$BUCKET = "a_bucket"

This change continues to join $ORG and $BUCKET with an underscore, but
disallows underscores in either $ORG or $BUCKET. It appears these values
are non-zero u64s in the gRPC protocol converted to their base-10 string
representations for the DB name, so this seems safe to enforce.

In addition, this change introduces a `DatabaseName` type to avoid
passing bare strings around, and allow consuming code to ensure only
valid database names are provided at compile type. This type works with
both owned & borrowed content so doesn't force a string copy where we
can avoid it, and derefs to `str` to make it easier to use with existing
code.

I've been minimally invasive in pushing the `DatabaseName` through the
existing code and figured I'd see what the sentement is first.
Candidates for conversion from `str` to `DatabaseName` that seem to make
sense to me include:

	- `DatabaseStore` trait
	- `RemoteServer` trait
	- Others? Basically anywhere other than the "edge" API inputs

Fixes #436 (thanks @zeebo)
2020-12-03 16:10:15 +00:00
Andrew Lamb 8c0e14e039
refactor: rename src/server/rpc/storage.rs to src/server/rpc/service.rs (#513)
* refactor: rename src/server/rpc/storage.rs src/server/rpc/service.rs

* refactor: update references
2020-12-03 09:59:00 -05:00
Dom 592c5c3679
Merge pull request #522 from influxdata/dom/ci-reduce-size
ci: remove IOx pre-building in rust build container
2020-12-03 13:25:08 +00:00
Dom 3589aec136
Merge branch 'main' into dom/ci-reduce-size 2020-12-03 13:14:52 +00:00
Edd Robinson 54ae680780
Merge pull request #520 from influxdata/er/refactor/read-filter-result
refactor: encapsulate results from segment/table into nicer types
2020-12-03 12:51:20 +00:00
Dom 7136e5853a ci: remove IOx pre-building in rust build container
Stops adding the IOx source code and performing a cargo build/test/clippy each
night. Previously this build would compile the IOx source & dependencies,
populating the incremental build cache and allowing builds that used the same
dependencies to complete quicker. This build caching was moved to
per-dependency-set caching in #496, and this pre-build is no longer used.

This should reduce the build image size substantially, making the whole CI
process a bit faster.
2020-12-03 11:58:13 +00:00
Edd Robinson 254dfc14d8
refactor: apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-12-03 11:47:41 +00:00
Edd Robinson 4f32778596 refactor: implement ReadFilterResults type
The `ReadFilterResults` type encapsulates results from multiple
segments. It implements `Display` to allow visualisation of results from
segments in a `select` call.
2020-12-03 11:23:12 +00:00
Edd Robinson 7ad0b4ad9a refactor: encapsulate read filter results in type
This commit also adds `Display` and `Debug` implementations for
`ReadFilterResult`. These can be used for visualising the contents of
the result of a `read_filter` call on a segment.

The former trait elides the column names.
2020-12-03 11:23:09 +00:00
Edd Robinson a088f33c35
Merge pull request #519 from influxdata/er/refactor/time-predicate
refactor: avoid requiring time predicate in Segment
2020-12-03 10:06:29 +00:00
Edd Robinson 05c420cc9e
Merge branch 'main' into er/refactor/time-predicate 2020-12-02 19:13:12 +00:00
Edd Robinson 381c3038aa
refactor: update segment_store/src/segment.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-12-02 19:13:00 +00:00
Andrew Lamb 8cb8276819
fix: Update gRPC definitions so tag_key=_field requests work in IOx (#517)
* fix: Update gRPC definitions so tag_key=_field requests work in IOx

* docs: Update src/server/rpc.rs

* fix: fixup test

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: consistent type annotations

* fix: refactor redundant test code into test_helpers

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-02 13:58:48 -05:00