Closes#528
This patch adds support for Microsfot Azure Blob storage. The
implementations requires an account, a key and container name. They can
be configured via the environment variables `AZURE_STORAGE_ACCOUNT`,
`AZURE_STORAGE_MASTER_KEY` and `AZURE_STORAGE_CONTAINER`.
This adds a new function list_with_delimiter to the object store. This commit contains just the implementation for S3, leaving the others to be completed in follow on commits.
This has a fixed delimiter to ensure a directory structure is created. This delimiter should be dependent on platform and which object store is used. For any of the cloud object stores or in memory, the delimiter should be /. For the future disk based implementation it should be dependendent on if you're running on Windows or Linux.
I didn't use Stream for the return type because I found it difficult to work with and I don't think it actually added anything useful. The return ListResult struct has the next token and I prefer that the caller explicitly makes calls that go over the network so they're more aware of what's going on, where a Stream abstracts that away so it's hidden behind the scenes. We can easilsy add a Stream based version on top of this existing API if we want.
* feat: Create configuration system, port IOx to use it
* docs: Apply suggestions from code review
Co-authored-by: Paul Dix <paul@influxdata.com>
* fix: fix test for setting values
Co-authored-by: Paul Dix <paul@influxdata.com>
This adds benchmarks to the data_types crate for ReplicatedWrite. This is the first in a series to test benchmarking Flatbuffers vs. JSON for the WAL Segment format.
* refactor: Update docs, remove unused field
* refactor: rename partition -> chunk
* feat: Introduce new partition, which is a holder for Chunks
* refactor: Remove use of wal from mutable database
* refactor: cleanups, remove last direct use of chunks
* fix: delete old benchmarks
* fix: clippy sacrifice
* docs: tidy up comments
* refactor: remove unused error types
* chore: remove commented out tests
This moves the HTTP API over to Routerify, which has the basic route parsing logic that will enable the API design for IOx.
I had a little trouble with the error handling in Routerify so I ended up creating a macro for constructing error responses in the HTTP API. I'm not sure what I think of this pattern so I'm interested in what others think. Another option would be to have two functions for each API endpoint. One which is x_handler with a Routerify function signature. Then another which is just x that has the Result<Response<Body>, ApplicationError> return type, which would make using the ? operator work in those functions. That would eliminate the need for the return_err macro.
I'm happy to refactor to that if people prefer it.
This commit swaps out the std library `HashMap` for the implementation
provided by the `hashbrown` crate. Not only does this allow us to use
the raw entry API, but it increases performance through the use of a
faster non-crytographically safe hashing function. We do not need an
expensive hash function for this code path.
Benchmark improvements are roughly 20-40%.
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Collecting 100 samples in estimated 6.5961 s (400 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000
time: [16.502 ms 16.527 ms 16.558 ms]
thrpt: [1.2079 Kelem/s 1.2101 Kelem/s 1.2120 Kelem/s]
change:
time: [-40.808% -40.616% -40.428%] (p = 0.00 < 0.05)
thrpt: [+67.863% +68.394% +68.942%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Collecting 100 samples in estimated 5.0698 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000
time: [16.531 ms 16.542 ms 16.555 ms]
thrpt: [12.081 Kelem/s 12.090 Kelem/s 12.099 Kelem/s]
change:
time: [-43.304% -43.047% -42.810%] (p = 0.00 < 0.05)
thrpt: [+74.856% +75.582% +76.378%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Collecting 100 samples in estimated 5.2590 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000
time: [17.497 ms 17.568 ms 17.648 ms]
thrpt: [113.33 Kelem/s 113.84 Kelem/s 114.30 Kelem/s]
change:
time: [-38.468% -38.188% -37.880%] (p = 0.00 < 0.05)
thrpt: [+60.978% +61.782% +62.518%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Collecting 100 samples in estimated 7.0471 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000
time: [23.305 ms 23.320 ms 23.336 ms]
thrpt: [857.05 Kelem/s 857.64 Kelem/s 858.20 Kelem/s]
change:
time: [-35.933% -35.778% -35.648%] (p = 0.00 < 0.05)
thrpt: [+55.396% +55.711% +56.087%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Collecting 100 samples in estimated 6.8058 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000
time: [22.475 ms 22.540 ms 22.622 ms]
thrpt: [884.10 Kelem/s 887.31 Kelem/s 889.87 Kelem/s]
change:
time: [-34.249% -34.051% -33.768%] (p = 0.00 < 0.05)
thrpt: [+50.984% +51.633% +52.089%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Collecting 100 samples in estimated 7.0631 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000
time: [23.683 ms 23.724 ms 23.779 ms]
thrpt: [841.08 Kelem/s 843.02 Kelem/s 844.49 Kelem/s]
change:
time: [-34.575% -34.419% -34.241%] (p = 0.00 < 0.05)
thrpt: [+52.070% +52.482% +52.847%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Collecting 100 samples in estimated 5.1007 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000
time: [25.379 ms 25.456 ms 25.545 ms]
thrpt: [782.93 Kelem/s 785.67 Kelem/s 788.06 Kelem/s]
change:
time: [-37.254% -36.988% -36.701%] (p = 0.00 < 0.05)
thrpt: [+57.981% +58.699% +59.373%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Collecting 100 samples in estimated 5.7756 s (400 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000
time: [14.404 ms 14.411 ms 14.419 ms]
thrpt: [1.3870 Melem/s 1.3878 Melem/s 1.3885 Melem/s]
change:
time: [-28.007% -27.893% -27.798%] (p = 0.00 < 0.05)
thrpt: [+38.500% +38.683% +38.903%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Collecting 100 samples in estimated 6.9256 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000
time: [23.191 ms 23.299 ms 23.419 ms]
thrpt: [854.02 Kelem/s 858.42 Kelem/s 862.40 Kelem/s]
change:
time: [-32.647% -32.302% -31.912%] (p = 0.00 < 0.05)
thrpt: [+46.868% +47.715% +48.471%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
11 (11.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Collecting 100 samples in estimated 6.1544 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000
time: [30.813 ms 30.859 ms 30.916 ms]
thrpt: [646.92 Kelem/s 648.10 Kelem/s 649.07 Kelem/s]
change:
time: [-37.155% -36.779% -36.436%] (p = 0.00 < 0.05)
thrpt: [+57.322% +58.174% +59.121%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Collecting 100 samples in estimated 7.8548 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000
time: [39.303 ms 39.349 ms 39.405 ms]
thrpt: [507.55 Kelem/s 508.27 Kelem/s 508.86 Kelem/s]
change:
time: [-36.857% -36.699% -36.576%] (p = 0.00 < 0.05)
thrpt: [+57.669% +57.975% +58.371%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe
This commit provides functionality on top of the `GroupKey` type (a
vector of materialised values), which allows them to be comparable by
implementing `Ord`.
Then, using the `permutation` crate, it is possible sort all rows in a
result set based on the group keys, which will be useful for testing.
Adds telemetry / tracing with support for a Jaeger backend, and changes the
logger from env_logger to a tracing subscriber to collect the log entries.
Events are batched and then emitted asynchronosuly via UDP to the Jaeger
collector using the tokio runtime. There's a bunch of settings (env
vars) related to batch sizes and flush frequency etc - they're all using
their default values at the moment (if it ain't broke...) See the docs
for more info:
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-environment-variables.md#opentelemetry-environment-variable-specification
This is only part 1 of telemetry - it does NOT propagate traces across RPC
boundaries as we're still defining how all this should work. I've created #541
to track this.
Closes#202 and closes#203.
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.
The benchmarks are in three groups:
* one group fixes the number of rows in the segment but varies the
cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
the number of columns needed to be grouped to produce the fixed
cardinality.
* a final group fixes the number of columns being grouped, the
cardinality, and instead varies the number of rows in the segment.
Some initial results from my development box are as follows:
```
time: [51.099 ms 51.119 ms 51.140 ms]
thrpt: [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/1
time: [93.162 us 93.219 us 93.280 us]
thrpt: [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
time: [571.72 us 572.31 us 572.98 us]
thrpt: [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
time: [1.7292 ms 1.7313 ms 1.7340 ms]
thrpt: [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/250000
time: [562.29 us 565.19 us 568.80 us]
thrpt: [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
6 (6.00%) high mild
12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
time: [561.32 us 561.85 us 562.47 us]
thrpt: [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
time: [573.75 us 574.27 us 574.85 us]
thrpt: [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
time: [586.36 us 586.74 us 587.19 us]
thrpt: [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
```
* feat: Implement write buffer to Parquet snapshotting
This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
* chore(server): add logs for dropped WAL segments
Added logging for dropped writes and old segments in rollover scenarios
Also including a dep on tracing and dev-dep on test_helpers
Refs: #466
* chore(server): Add more context to logs
Minor cleanup around remove_oldest_segment usage
Suggestions from @alamb's review
This splits the cluster package out into server and buffer modules. The WAL buffer is in-memory and split into segments. Follow on commits will implement it in the server and add persistence to object storage.
* feat: Port enough of Window and Duration to implement window_bounds
* fix: clippy
* fix: Add a few more source links
* fix: Eust --> Rust in comments :(
* fix: add comments about remainder, and add test demonstraitng behavior
* fix: Apply suggestions from code review