Commit Graph

408 Commits (8b458e2c2b9ef352044937945a00e6deb875cb02)

Author SHA1 Message Date
Edd Robinson 8b458e2c2b feat: add API for validating a predicate against a chunk: 2021-09-29 14:42:42 +01:00
Edd Robinson 0a3ca90809 refactor: only validate same types 2021-09-29 14:39:43 +01:00
Edd Robinson a70a55cb3d test: update benchmarks 2021-09-24 15:00:17 +01:00
Edd Robinson 5c7459f488 feat: validate predicates on satisfies_predicate 2021-09-24 14:52:19 +01:00
Edd Robinson a69e46efc6 feat: validate predicates on column_values 2021-09-24 14:52:19 +01:00
Edd Robinson f618aa1b76 feat: validate predicates on column_names 2021-09-24 14:52:19 +01:00
Edd Robinson c107434d20 feat: validate predicates on read_aggregate 2021-09-24 14:52:19 +01:00
Edd Robinson 621b26166c feat: validate predicates on read_filter 2021-09-24 14:52:16 +01:00
Edd Robinson 053186ab29 feat: add ability validate predicate compatible with schema 2021-09-24 13:05:46 +01:00
dependabot[bot] 876bb10cf8
chore(deps): bump rand_distr from 0.4.1 to 0.4.2
Bumps [rand_distr](https://github.com/rust-random/rand) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/rust-random/rand/releases)
- [Changelog](https://github.com/rust-random/rand/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-random/rand/compare/rand_distr-0.4.1...rand_distr-0.4.2)

---
updated-dependencies:
- dependency-name: rand_distr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-20 08:39:39 +00:00
Edd Robinson e51dd0365a refactor: PR feedback
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-09-16 10:01:44 +01:00
Edd Robinson f7228ddd60 test: add test for byte trimmed floats 2021-09-16 10:01:44 +01:00
Edd Robinson d387108dab fix: float byte trimmer filter range 2021-09-16 10:01:44 +01:00
Edd Robinson 0250bd1337 fix: ensure range filter works with null 2021-09-16 10:01:44 +01:00
Edd Robinson 1a70865a03 fix: ensure float byte trimmed predicate pushdown works for unencodable values 2021-09-16 10:01:44 +01:00
Edd Robinson 483508e3c6 feat: add rle method for identifying all non-null row IDs 2021-09-16 10:01:44 +01:00
Edd Robinson d04a0d1137 feat: add method for identifying all non-null row IDs 2021-09-16 10:01:44 +01:00
Edd Robinson 70b0ba44b3 test: failing filter test 2021-09-16 10:01:44 +01:00
Raphael Taylor-Davies c66095cad1
feat: remove metrics crate (#2552) 2021-09-15 19:43:33 +00:00
Raphael Taylor-Davies b8f7319704
feat: migrate read buffer metrics to metric crate (#2510)
* feat: migrate read buffer metrics to metric crate

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-10 19:51:43 +00:00
Marco Neumann 79ad48ac3a chore: rename "labels" to "attributes" 2021-08-31 11:31:15 +02:00
Edd Robinson 8dd0576070 refactor: address PR feedback 2021-08-27 12:36:19 +01:00
Edd Robinson 6c49ac5bd4 refactor: update read_buffer/src/chunk.rs 2021-08-27 12:30:20 +01:00
Edd Robinson 6c7f8d6630 feat: add delete to crate Read Buffer API 2021-08-27 12:30:20 +01:00
Edd Robinson dbbfd2a9f8 feat: add delete support to row_group: 2021-08-27 12:30:20 +01:00
Edd Robinson 95548dcec9 feat: add relative complement to RowIDs(bitmap) 2021-08-27 12:30:20 +01:00
Edd Robinson 69329b0b38
Merge branch 'main' into er/refactor/read_buffer/rle_entries 2021-08-25 12:08:44 +01:00
Edd Robinson 11e88877f4 fix: correct size estimation of RLE encoding 2021-08-25 12:03:04 +01:00
Edd Robinson d18e835b4f refactor: remove next_id generation 2021-08-25 11:31:51 +01:00
Edd Robinson 833a410e4a refactor: replace btreeset for vec
Benchmarks are roughly the same depending on the workload

 critcmp master_string pr_string
group                                                                master_string                             pr_string
-----                                                                -------------                             ---------
_select/enc_"plain encoder"/rows_100000/loc_End/card_100             1.12     43.9±0.41µs  2.1 GElem/sec       1.00     39.4±0.40µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_1000            1.00     32.9±0.43µs  2.8 GElem/sec       1.00     33.0±0.48µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_10000           1.00     32.1±0.37µs  2.9 GElem/sec       1.00     32.2±0.43µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_100          1.02     40.2±0.79µs  2.3 GElem/sec       1.00     39.5±0.56µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_1000         1.00     33.0±0.42µs  2.8 GElem/sec       1.00     33.0±0.38µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_10000        1.00     32.3±0.41µs  2.9 GElem/sec       1.00     32.4±0.53µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_100           1.04     41.2±1.45µs  2.3 GElem/sec       1.00     39.5±0.54µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_1000          1.01     33.4±0.87µs  2.8 GElem/sec       1.00     32.9±0.43µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_10000         1.01     32.5±0.44µs  2.9 GElem/sec       1.00     32.3±0.51µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_1000           1.00    382.0±3.43µs  2.4 GElem/sec       1.00    382.0±4.04µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_10000          1.00    376.7±4.67µs  2.5 GElem/sec       1.00   377.2±12.83µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_100000         1.00    374.4±3.08µs  2.5 GElem/sec       1.00    375.0±4.09µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_1000        1.00    382.4±4.68µs  2.4 GElem/sec       1.00    382.8±4.61µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_10000       1.00    375.8±3.55µs  2.5 GElem/sec       1.00    376.0±4.17µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_100000      1.00    374.7±3.76µs  2.5 GElem/sec       1.00    375.1±4.44µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_1000         1.00    382.1±3.80µs  2.4 GElem/sec       1.00    382.2±3.44µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_10000        1.00    376.5±4.85µs  2.5 GElem/sec       1.00    376.5±4.76µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_100000       1.00    375.0±3.41µs  2.5 GElem/sec       1.00    375.3±4.28µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_10000         1.00      3.7±0.02ms  2.5 GElem/sec       1.01      3.8±0.06ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_100000        1.00      3.7±0.01ms  2.5 GElem/sec       1.01      3.8±0.06ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_1000000       1.00      3.7±0.01ms  2.5 GElem/sec       1.01      3.8±0.10ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_10000      1.00      3.8±0.03ms  2.5 GElem/sec       1.00      3.8±0.04ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_100000     1.00      3.8±0.03ms  2.5 GElem/sec       1.07      4.0±0.73ms  2.3 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_1000000    1.02      3.8±0.06ms  2.4 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_10000       1.00      3.8±0.03ms  2.5 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_100000      1.00      3.8±0.04ms  2.5 GElem/sec       1.00      3.8±0.04ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_1000000     1.00      3.8±0.05ms  2.5 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_100                1.00      2.9±0.03µs 32.0 GElem/sec       1.01      2.9±0.09µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_1000               1.06  1002.0±13.75ns 93.0 GElem/sec       1.00    948.3±9.63ns 98.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_10000              1.02      4.6±0.05µs 20.3 GElem/sec       1.00      4.5±0.17µs 20.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_100             1.00      3.0±0.03µs 31.5 GElem/sec       1.00      2.9±0.04µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_1000            1.04   788.9±12.39ns 118.1 GElem/sec      1.00   755.7±20.50ns 123.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_10000           1.00      2.8±0.43µs 33.5 GElem/sec       1.02      2.8±0.03µs 32.8 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_100              1.00      2.9±0.04µs 32.3 GElem/sec       1.02      2.9±0.10µs 31.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_1000             1.03   597.4±14.85ns 155.9 GElem/sec      1.00   581.1±13.60ns 160.3 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_10000            1.42   606.6±13.37ns 153.5 GElem/sec      1.00    426.0±6.32ns 218.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_1000              1.00      3.3±0.03µs 280.9 GElem/sec      1.03      3.4±0.47µs 273.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_10000             1.00      4.6±0.09µs 200.6 GElem/sec      1.03      4.8±0.06µs 194.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_100000            1.01     41.5±0.44µs 22.4 GElem/sec       1.00     41.1±0.57µs 22.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_1000           1.02      3.1±0.04µs 296.8 GElem/sec      1.00      3.1±0.05µs 301.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_10000          1.00      2.8±0.05µs 332.6 GElem/sec      1.12      3.1±0.46µs 297.2 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_100000         1.10     23.7±0.30µs 39.2 GElem/sec       1.00     21.5±0.25µs 43.3 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_1000            1.00      2.9±0.03µs 321.1 GElem/sec      1.00      2.9±0.04µs 320.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_10000           1.00    623.6±7.76ns 1493.6 GElem/sec     1.06   661.5±44.34ns 1408.0 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_100000          1.00   954.4±18.68ns 975.9 GElem/sec      2.94      2.8±0.89µs 331.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_10000            1.01      7.0±0.09µs 1335.5 GElem/sec     1.00      6.9±0.10µs 1353.8 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_100000           1.06     42.8±0.78µs 217.6 GElem/sec      1.00     40.4±0.49µs 230.7 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_1000000          1.00    397.9±6.26µs 23.4 GElem/sec       1.09    433.3±5.78µs 21.5 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_10000         1.03      5.2±0.05µs 1779.4 GElem/sec     1.00      5.1±0.17µs 1840.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_100000        1.00     20.3±0.21µs 458.9 GElem/sec      1.15     23.4±0.30µs 397.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_1000000       1.18    211.4±3.28µs 44.1 GElem/sec       1.00    178.5±2.56µs 52.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_10000          1.00      3.0±0.04µs 3091.2 GElem/sec     1.00      3.0±0.08µs 3079.4 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_100000         1.00   785.1±10.39ns 11862.8 GElem/sec    2.48  1948.8±44.72ns 4778.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_1000000        1.00      6.5±0.07µs 1433.0 GElem/sec     2.07     13.5±0.16µs 692.3 GElem/sec
2021-08-25 11:19:58 +01:00
Edd Robinson f3c57c47fa
Merge branch 'main' into er/refactor/read_buffer/table_arg 2021-08-25 10:30:12 +01:00
Marco Neumann fac79a2ae7
refactor: simplify RLE allocation code
Co-authored-by: Edd Robinson <me@edd.io>
2021-08-25 10:54:18 +02:00
Marco Neumann 2ad9843e5f feat: make `RLE` a bit smaller by capacity-based allocation
For some demo data this reduced the overall chunk size from

195049367 bytes
to
191088095 bytes
2021-08-25 10:22:43 +02:00
Edd Robinson 5648817285 refactor: remove redunant argument 2021-08-24 22:26:17 +01:00
Edd Robinson 49fc23bd7e perf: remove redunant clone 2021-08-23 14:27:06 +01:00
Edd Robinson 74c767337b perf: ensure RLE strings minimally sized 2021-08-23 14:26:53 +01:00
Edd Robinson e2130b075b refactor: account for string cap in RLE size 2021-08-23 14:22:49 +01:00
Edd Robinson 47747a602d refactor: remove cruft 2021-08-23 14:19:19 +01:00
Edd Robinson 891bb4f03a refactor: shrink strings 2021-08-23 14:17:53 +01:00
Edd Robinson 1ed086ab55 refactor: use capacity for dictionary encoding 2021-08-23 14:15:37 +01:00
Marco Neumann b2682c0b0e fix: shrink RUB-string-RLE keys capacity to fit
We were underestimating the size of a RUB string-RLE column depending on
how the data came into existence. A well-placed debug assert proved
that.
2021-08-23 13:18:46 +02:00
kodiakhq[bot] fc6a7ea532
Merge branch 'main' into er/refactor/read_buffer/bitmap_size 2021-08-19 14:20:38 +00:00
Raphael Taylor-Davies 98627944e7
refactor: make packers a dev-dependency of read buffer (#2345)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-19 11:09:34 +00:00
Edd Robinson b9f09fce49 feat: improve bitset size estimation 2021-08-17 22:54:22 +01:00
Edd Robinson 1daa30cc7d fix: include enum in sizing 2021-08-17 22:54:22 +01:00
Edd Robinson c795fc7f9d feat: add metric to track total row groups 2021-08-17 12:55:11 +01:00
Edd Robinson eee4e10fd1 refactor: rename statistic to required_bytes 2021-08-13 11:57:46 +01:00
Edd Robinson efde3a8f5a feat: expose required bytes metric 2021-08-13 11:57:46 +01:00
Edd Robinson de702ec820 refactor: make allocated bytes explicit Read Buffer metric 2021-08-13 11:57:46 +01:00
Edd Robinson 311d36d776 refactor: include capacity in Read Buffer chunk size 2021-08-13 11:57:46 +01:00