Edd Robinson
a70a55cb3d
test: update benchmarks
2021-09-24 15:00:17 +01:00
Edd Robinson
5c7459f488
feat: validate predicates on satisfies_predicate
2021-09-24 14:52:19 +01:00
Edd Robinson
a69e46efc6
feat: validate predicates on column_values
2021-09-24 14:52:19 +01:00
Edd Robinson
f618aa1b76
feat: validate predicates on column_names
2021-09-24 14:52:19 +01:00
Edd Robinson
c107434d20
feat: validate predicates on read_aggregate
2021-09-24 14:52:19 +01:00
Edd Robinson
621b26166c
feat: validate predicates on read_filter
2021-09-24 14:52:16 +01:00
Edd Robinson
053186ab29
feat: add ability validate predicate compatible with schema
2021-09-24 13:05:46 +01:00
dependabot[bot]
876bb10cf8
chore(deps): bump rand_distr from 0.4.1 to 0.4.2
...
Bumps [rand_distr](https://github.com/rust-random/rand ) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/rust-random/rand/releases )
- [Changelog](https://github.com/rust-random/rand/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-random/rand/compare/rand_distr-0.4.1...rand_distr-0.4.2 )
---
updated-dependencies:
- dependency-name: rand_distr
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-09-20 08:39:39 +00:00
Edd Robinson
e51dd0365a
refactor: PR feedback
...
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-09-16 10:01:44 +01:00
Edd Robinson
f7228ddd60
test: add test for byte trimmed floats
2021-09-16 10:01:44 +01:00
Edd Robinson
d387108dab
fix: float byte trimmer filter range
2021-09-16 10:01:44 +01:00
Edd Robinson
0250bd1337
fix: ensure range filter works with null
2021-09-16 10:01:44 +01:00
Edd Robinson
1a70865a03
fix: ensure float byte trimmed predicate pushdown works for unencodable values
2021-09-16 10:01:44 +01:00
Edd Robinson
483508e3c6
feat: add rle method for identifying all non-null row IDs
2021-09-16 10:01:44 +01:00
Edd Robinson
d04a0d1137
feat: add method for identifying all non-null row IDs
2021-09-16 10:01:44 +01:00
Edd Robinson
70b0ba44b3
test: failing filter test
2021-09-16 10:01:44 +01:00
Raphael Taylor-Davies
c66095cad1
feat: remove metrics crate ( #2552 )
2021-09-15 19:43:33 +00:00
Raphael Taylor-Davies
b8f7319704
feat: migrate read buffer metrics to metric crate ( #2510 )
...
* feat: migrate read buffer metrics to metric crate
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-10 19:51:43 +00:00
Marco Neumann
79ad48ac3a
chore: rename "labels" to "attributes"
2021-08-31 11:31:15 +02:00
Edd Robinson
8dd0576070
refactor: address PR feedback
2021-08-27 12:36:19 +01:00
Edd Robinson
6c49ac5bd4
refactor: update read_buffer/src/chunk.rs
2021-08-27 12:30:20 +01:00
Edd Robinson
6c7f8d6630
feat: add delete to crate Read Buffer API
2021-08-27 12:30:20 +01:00
Edd Robinson
dbbfd2a9f8
feat: add delete support to row_group:
2021-08-27 12:30:20 +01:00
Edd Robinson
95548dcec9
feat: add relative complement to RowIDs(bitmap)
2021-08-27 12:30:20 +01:00
Edd Robinson
69329b0b38
Merge branch 'main' into er/refactor/read_buffer/rle_entries
2021-08-25 12:08:44 +01:00
Edd Robinson
11e88877f4
fix: correct size estimation of RLE encoding
2021-08-25 12:03:04 +01:00
Edd Robinson
d18e835b4f
refactor: remove next_id generation
2021-08-25 11:31:51 +01:00
Edd Robinson
833a410e4a
refactor: replace btreeset for vec
...
Benchmarks are roughly the same depending on the workload
critcmp master_string pr_string
group master_string pr_string
----- ------------- ---------
_select/enc_"plain encoder"/rows_100000/loc_End/card_100 1.12 43.9±0.41µs 2.1 GElem/sec 1.00 39.4±0.40µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_1000 1.00 32.9±0.43µs 2.8 GElem/sec 1.00 33.0±0.48µs 2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_10000 1.00 32.1±0.37µs 2.9 GElem/sec 1.00 32.2±0.43µs 2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_100 1.02 40.2±0.79µs 2.3 GElem/sec 1.00 39.5±0.56µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_1000 1.00 33.0±0.42µs 2.8 GElem/sec 1.00 33.0±0.38µs 2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_10000 1.00 32.3±0.41µs 2.9 GElem/sec 1.00 32.4±0.53µs 2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_100 1.04 41.2±1.45µs 2.3 GElem/sec 1.00 39.5±0.54µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_1000 1.01 33.4±0.87µs 2.8 GElem/sec 1.00 32.9±0.43µs 2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_10000 1.01 32.5±0.44µs 2.9 GElem/sec 1.00 32.3±0.51µs 2.9 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_1000 1.00 382.0±3.43µs 2.4 GElem/sec 1.00 382.0±4.04µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_10000 1.00 376.7±4.67µs 2.5 GElem/sec 1.00 377.2±12.83µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_100000 1.00 374.4±3.08µs 2.5 GElem/sec 1.00 375.0±4.09µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_1000 1.00 382.4±4.68µs 2.4 GElem/sec 1.00 382.8±4.61µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_10000 1.00 375.8±3.55µs 2.5 GElem/sec 1.00 376.0±4.17µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_100000 1.00 374.7±3.76µs 2.5 GElem/sec 1.00 375.1±4.44µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_1000 1.00 382.1±3.80µs 2.4 GElem/sec 1.00 382.2±3.44µs 2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_10000 1.00 376.5±4.85µs 2.5 GElem/sec 1.00 376.5±4.76µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_100000 1.00 375.0±3.41µs 2.5 GElem/sec 1.00 375.3±4.28µs 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_10000 1.00 3.7±0.02ms 2.5 GElem/sec 1.01 3.8±0.06ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_100000 1.00 3.7±0.01ms 2.5 GElem/sec 1.01 3.8±0.06ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_1000000 1.00 3.7±0.01ms 2.5 GElem/sec 1.01 3.8±0.10ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_10000 1.00 3.8±0.03ms 2.5 GElem/sec 1.00 3.8±0.04ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_100000 1.00 3.8±0.03ms 2.5 GElem/sec 1.07 4.0±0.73ms 2.3 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_1000000 1.02 3.8±0.06ms 2.4 GElem/sec 1.00 3.8±0.03ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_10000 1.00 3.8±0.03ms 2.5 GElem/sec 1.00 3.8±0.03ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_100000 1.00 3.8±0.04ms 2.5 GElem/sec 1.00 3.8±0.04ms 2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_1000000 1.00 3.8±0.05ms 2.5 GElem/sec 1.00 3.8±0.03ms 2.5 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_100 1.00 2.9±0.03µs 32.0 GElem/sec 1.01 2.9±0.09µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_1000 1.06 1002.0±13.75ns 93.0 GElem/sec 1.00 948.3±9.63ns 98.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_10000 1.02 4.6±0.05µs 20.3 GElem/sec 1.00 4.5±0.17µs 20.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_100 1.00 3.0±0.03µs 31.5 GElem/sec 1.00 2.9±0.04µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_1000 1.04 788.9±12.39ns 118.1 GElem/sec 1.00 755.7±20.50ns 123.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_10000 1.00 2.8±0.43µs 33.5 GElem/sec 1.02 2.8±0.03µs 32.8 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_100 1.00 2.9±0.04µs 32.3 GElem/sec 1.02 2.9±0.10µs 31.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_1000 1.03 597.4±14.85ns 155.9 GElem/sec 1.00 581.1±13.60ns 160.3 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_10000 1.42 606.6±13.37ns 153.5 GElem/sec 1.00 426.0±6.32ns 218.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_1000 1.00 3.3±0.03µs 280.9 GElem/sec 1.03 3.4±0.47µs 273.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_10000 1.00 4.6±0.09µs 200.6 GElem/sec 1.03 4.8±0.06µs 194.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_100000 1.01 41.5±0.44µs 22.4 GElem/sec 1.00 41.1±0.57µs 22.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_1000 1.02 3.1±0.04µs 296.8 GElem/sec 1.00 3.1±0.05µs 301.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_10000 1.00 2.8±0.05µs 332.6 GElem/sec 1.12 3.1±0.46µs 297.2 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_100000 1.10 23.7±0.30µs 39.2 GElem/sec 1.00 21.5±0.25µs 43.3 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_1000 1.00 2.9±0.03µs 321.1 GElem/sec 1.00 2.9±0.04µs 320.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_10000 1.00 623.6±7.76ns 1493.6 GElem/sec 1.06 661.5±44.34ns 1408.0 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_100000 1.00 954.4±18.68ns 975.9 GElem/sec 2.94 2.8±0.89µs 331.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_10000 1.01 7.0±0.09µs 1335.5 GElem/sec 1.00 6.9±0.10µs 1353.8 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_100000 1.06 42.8±0.78µs 217.6 GElem/sec 1.00 40.4±0.49µs 230.7 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_1000000 1.00 397.9±6.26µs 23.4 GElem/sec 1.09 433.3±5.78µs 21.5 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_10000 1.03 5.2±0.05µs 1779.4 GElem/sec 1.00 5.1±0.17µs 1840.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_100000 1.00 20.3±0.21µs 458.9 GElem/sec 1.15 23.4±0.30µs 397.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_1000000 1.18 211.4±3.28µs 44.1 GElem/sec 1.00 178.5±2.56µs 52.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_10000 1.00 3.0±0.04µs 3091.2 GElem/sec 1.00 3.0±0.08µs 3079.4 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_100000 1.00 785.1±10.39ns 11862.8 GElem/sec 2.48 1948.8±44.72ns 4778.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_1000000 1.00 6.5±0.07µs 1433.0 GElem/sec 2.07 13.5±0.16µs 692.3 GElem/sec
2021-08-25 11:19:58 +01:00
Edd Robinson
f3c57c47fa
Merge branch 'main' into er/refactor/read_buffer/table_arg
2021-08-25 10:30:12 +01:00
Marco Neumann
fac79a2ae7
refactor: simplify RLE allocation code
...
Co-authored-by: Edd Robinson <me@edd.io>
2021-08-25 10:54:18 +02:00
Marco Neumann
2ad9843e5f
feat: make `RLE` a bit smaller by capacity-based allocation
...
For some demo data this reduced the overall chunk size from
195049367 bytes
to
191088095 bytes
2021-08-25 10:22:43 +02:00
Edd Robinson
5648817285
refactor: remove redunant argument
2021-08-24 22:26:17 +01:00
Edd Robinson
49fc23bd7e
perf: remove redunant clone
2021-08-23 14:27:06 +01:00
Edd Robinson
74c767337b
perf: ensure RLE strings minimally sized
2021-08-23 14:26:53 +01:00
Edd Robinson
e2130b075b
refactor: account for string cap in RLE size
2021-08-23 14:22:49 +01:00
Edd Robinson
47747a602d
refactor: remove cruft
2021-08-23 14:19:19 +01:00
Edd Robinson
891bb4f03a
refactor: shrink strings
2021-08-23 14:17:53 +01:00
Edd Robinson
1ed086ab55
refactor: use capacity for dictionary encoding
2021-08-23 14:15:37 +01:00
Marco Neumann
b2682c0b0e
fix: shrink RUB-string-RLE keys capacity to fit
...
We were underestimating the size of a RUB string-RLE column depending on
how the data came into existence. A well-placed debug assert proved
that.
2021-08-23 13:18:46 +02:00
kodiakhq[bot]
fc6a7ea532
Merge branch 'main' into er/refactor/read_buffer/bitmap_size
2021-08-19 14:20:38 +00:00
Raphael Taylor-Davies
98627944e7
refactor: make packers a dev-dependency of read buffer ( #2345 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-19 11:09:34 +00:00
Edd Robinson
b9f09fce49
feat: improve bitset size estimation
2021-08-17 22:54:22 +01:00
Edd Robinson
1daa30cc7d
fix: include enum in sizing
2021-08-17 22:54:22 +01:00
Edd Robinson
c795fc7f9d
feat: add metric to track total row groups
2021-08-17 12:55:11 +01:00
Edd Robinson
eee4e10fd1
refactor: rename statistic to required_bytes
2021-08-13 11:57:46 +01:00
Edd Robinson
efde3a8f5a
feat: expose required bytes metric
2021-08-13 11:57:46 +01:00
Edd Robinson
de702ec820
refactor: make allocated bytes explicit Read Buffer metric
2021-08-13 11:57:46 +01:00
Edd Robinson
311d36d776
refactor: include capacity in Read Buffer chunk size
2021-08-13 11:57:46 +01:00
Edd Robinson
03592aaf94
refactor: ignore bitmap size from required bytes
...
Bitmaps are a performance optimisation; they're not required for the RLE compression and so it seems reasonable to ignore them when assessing the compression performance of RLE.
2021-08-13 11:57:46 +01:00
Edd Robinson
fa8da19c45
refactor: expose enc size API into column
2021-08-13 11:57:46 +01:00