Commit Graph

387 Commits (7e7855f727296224c1525d09d4560cdcff069311)

Author SHA1 Message Date
Edd Robinson 8dd0576070 refactor: address PR feedback 2021-08-27 12:36:19 +01:00
Edd Robinson 6c49ac5bd4 refactor: update read_buffer/src/chunk.rs 2021-08-27 12:30:20 +01:00
Edd Robinson 6c7f8d6630 feat: add delete to crate Read Buffer API 2021-08-27 12:30:20 +01:00
Edd Robinson dbbfd2a9f8 feat: add delete support to row_group: 2021-08-27 12:30:20 +01:00
Edd Robinson 95548dcec9 feat: add relative complement to RowIDs(bitmap) 2021-08-27 12:30:20 +01:00
Edd Robinson 69329b0b38
Merge branch 'main' into er/refactor/read_buffer/rle_entries 2021-08-25 12:08:44 +01:00
Edd Robinson 11e88877f4 fix: correct size estimation of RLE encoding 2021-08-25 12:03:04 +01:00
Edd Robinson d18e835b4f refactor: remove next_id generation 2021-08-25 11:31:51 +01:00
Edd Robinson 833a410e4a refactor: replace btreeset for vec
Benchmarks are roughly the same depending on the workload

 critcmp master_string pr_string
group                                                                master_string                             pr_string
-----                                                                -------------                             ---------
_select/enc_"plain encoder"/rows_100000/loc_End/card_100             1.12     43.9±0.41µs  2.1 GElem/sec       1.00     39.4±0.40µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_1000            1.00     32.9±0.43µs  2.8 GElem/sec       1.00     33.0±0.48µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_End/card_10000           1.00     32.1±0.37µs  2.9 GElem/sec       1.00     32.2±0.43µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_100          1.02     40.2±0.79µs  2.3 GElem/sec       1.00     39.5±0.56µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_1000         1.00     33.0±0.42µs  2.8 GElem/sec       1.00     33.0±0.38µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Middle/card_10000        1.00     32.3±0.41µs  2.9 GElem/sec       1.00     32.4±0.53µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_100           1.04     41.2±1.45µs  2.3 GElem/sec       1.00     39.5±0.54µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_1000          1.01     33.4±0.87µs  2.8 GElem/sec       1.00     32.9±0.43µs  2.8 GElem/sec
_select/enc_"plain encoder"/rows_100000/loc_Start/card_10000         1.01     32.5±0.44µs  2.9 GElem/sec       1.00     32.3±0.51µs  2.9 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_1000           1.00    382.0±3.43µs  2.4 GElem/sec       1.00    382.0±4.04µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_10000          1.00    376.7±4.67µs  2.5 GElem/sec       1.00   377.2±12.83µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_End/card_100000         1.00    374.4±3.08µs  2.5 GElem/sec       1.00    375.0±4.09µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_1000        1.00    382.4±4.68µs  2.4 GElem/sec       1.00    382.8±4.61µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_10000       1.00    375.8±3.55µs  2.5 GElem/sec       1.00    376.0±4.17µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Middle/card_100000      1.00    374.7±3.76µs  2.5 GElem/sec       1.00    375.1±4.44µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_1000         1.00    382.1±3.80µs  2.4 GElem/sec       1.00    382.2±3.44µs  2.4 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_10000        1.00    376.5±4.85µs  2.5 GElem/sec       1.00    376.5±4.76µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_1000000/loc_Start/card_100000       1.00    375.0±3.41µs  2.5 GElem/sec       1.00    375.3±4.28µs  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_10000         1.00      3.7±0.02ms  2.5 GElem/sec       1.01      3.8±0.06ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_100000        1.00      3.7±0.01ms  2.5 GElem/sec       1.01      3.8±0.06ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_End/card_1000000       1.00      3.7±0.01ms  2.5 GElem/sec       1.01      3.8±0.10ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_10000      1.00      3.8±0.03ms  2.5 GElem/sec       1.00      3.8±0.04ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_100000     1.00      3.8±0.03ms  2.5 GElem/sec       1.07      4.0±0.73ms  2.3 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Middle/card_1000000    1.02      3.8±0.06ms  2.4 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_10000       1.00      3.8±0.03ms  2.5 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_100000      1.00      3.8±0.04ms  2.5 GElem/sec       1.00      3.8±0.04ms  2.5 GElem/sec
_select/enc_"plain encoder"/rows_10000000/loc_Start/card_1000000     1.00      3.8±0.05ms  2.5 GElem/sec       1.00      3.8±0.03ms  2.5 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_100                1.00      2.9±0.03µs 32.0 GElem/sec       1.01      2.9±0.09µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_1000               1.06  1002.0±13.75ns 93.0 GElem/sec       1.00    948.3±9.63ns 98.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_End/card_10000              1.02      4.6±0.05µs 20.3 GElem/sec       1.00      4.5±0.17µs 20.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_100             1.00      3.0±0.03µs 31.5 GElem/sec       1.00      2.9±0.04µs 31.6 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_1000            1.04   788.9±12.39ns 118.1 GElem/sec      1.00   755.7±20.50ns 123.2 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Middle/card_10000           1.00      2.8±0.43µs 33.5 GElem/sec       1.02      2.8±0.03µs 32.8 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_100              1.00      2.9±0.04µs 32.3 GElem/sec       1.02      2.9±0.10µs 31.7 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_1000             1.03   597.4±14.85ns 155.9 GElem/sec      1.00   581.1±13.60ns 160.3 GElem/sec
select/enc_"RLE encoder"/rows_100000/loc_Start/card_10000            1.42   606.6±13.37ns 153.5 GElem/sec      1.00    426.0±6.32ns 218.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_1000              1.00      3.3±0.03µs 280.9 GElem/sec      1.03      3.4±0.47µs 273.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_10000             1.00      4.6±0.09µs 200.6 GElem/sec      1.03      4.8±0.06µs 194.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_End/card_100000            1.01     41.5±0.44µs 22.4 GElem/sec       1.00     41.1±0.57µs 22.6 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_1000           1.02      3.1±0.04µs 296.8 GElem/sec      1.00      3.1±0.05µs 301.8 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_10000          1.00      2.8±0.05µs 332.6 GElem/sec      1.12      3.1±0.46µs 297.2 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Middle/card_100000         1.10     23.7±0.30µs 39.2 GElem/sec       1.00     21.5±0.25µs 43.3 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_1000            1.00      2.9±0.03µs 321.1 GElem/sec      1.00      2.9±0.04µs 320.5 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_10000           1.00    623.6±7.76ns 1493.6 GElem/sec     1.06   661.5±44.34ns 1408.0 GElem/sec
select/enc_"RLE encoder"/rows_1000000/loc_Start/card_100000          1.00   954.4±18.68ns 975.9 GElem/sec      2.94      2.8±0.89µs 331.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_10000            1.01      7.0±0.09µs 1335.5 GElem/sec     1.00      6.9±0.10µs 1353.8 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_100000           1.06     42.8±0.78µs 217.6 GElem/sec      1.00     40.4±0.49µs 230.7 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_End/card_1000000          1.00    397.9±6.26µs 23.4 GElem/sec       1.09    433.3±5.78µs 21.5 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_10000         1.03      5.2±0.05µs 1779.4 GElem/sec     1.00      5.1±0.17µs 1840.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_100000        1.00     20.3±0.21µs 458.9 GElem/sec      1.15     23.4±0.30µs 397.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Middle/card_1000000       1.18    211.4±3.28µs 44.1 GElem/sec       1.00    178.5±2.56µs 52.2 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_10000          1.00      3.0±0.04µs 3091.2 GElem/sec     1.00      3.0±0.08µs 3079.4 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_100000         1.00   785.1±10.39ns 11862.8 GElem/sec    2.48  1948.8±44.72ns 4778.9 GElem/sec
select/enc_"RLE encoder"/rows_10000000/loc_Start/card_1000000        1.00      6.5±0.07µs 1433.0 GElem/sec     2.07     13.5±0.16µs 692.3 GElem/sec
2021-08-25 11:19:58 +01:00
Edd Robinson f3c57c47fa
Merge branch 'main' into er/refactor/read_buffer/table_arg 2021-08-25 10:30:12 +01:00
Marco Neumann fac79a2ae7
refactor: simplify RLE allocation code
Co-authored-by: Edd Robinson <me@edd.io>
2021-08-25 10:54:18 +02:00
Marco Neumann 2ad9843e5f feat: make `RLE` a bit smaller by capacity-based allocation
For some demo data this reduced the overall chunk size from

195049367 bytes
to
191088095 bytes
2021-08-25 10:22:43 +02:00
Edd Robinson 5648817285 refactor: remove redunant argument 2021-08-24 22:26:17 +01:00
Edd Robinson 49fc23bd7e perf: remove redunant clone 2021-08-23 14:27:06 +01:00
Edd Robinson 74c767337b perf: ensure RLE strings minimally sized 2021-08-23 14:26:53 +01:00
Edd Robinson e2130b075b refactor: account for string cap in RLE size 2021-08-23 14:22:49 +01:00
Edd Robinson 47747a602d refactor: remove cruft 2021-08-23 14:19:19 +01:00
Edd Robinson 891bb4f03a refactor: shrink strings 2021-08-23 14:17:53 +01:00
Edd Robinson 1ed086ab55 refactor: use capacity for dictionary encoding 2021-08-23 14:15:37 +01:00
Marco Neumann b2682c0b0e fix: shrink RUB-string-RLE keys capacity to fit
We were underestimating the size of a RUB string-RLE column depending on
how the data came into existence. A well-placed debug assert proved
that.
2021-08-23 13:18:46 +02:00
kodiakhq[bot] fc6a7ea532
Merge branch 'main' into er/refactor/read_buffer/bitmap_size 2021-08-19 14:20:38 +00:00
Raphael Taylor-Davies 98627944e7
refactor: make packers a dev-dependency of read buffer (#2345)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-19 11:09:34 +00:00
Edd Robinson b9f09fce49 feat: improve bitset size estimation 2021-08-17 22:54:22 +01:00
Edd Robinson 1daa30cc7d fix: include enum in sizing 2021-08-17 22:54:22 +01:00
Edd Robinson c795fc7f9d feat: add metric to track total row groups 2021-08-17 12:55:11 +01:00
Edd Robinson eee4e10fd1 refactor: rename statistic to required_bytes 2021-08-13 11:57:46 +01:00
Edd Robinson efde3a8f5a feat: expose required bytes metric 2021-08-13 11:57:46 +01:00
Edd Robinson de702ec820 refactor: make allocated bytes explicit Read Buffer metric 2021-08-13 11:57:46 +01:00
Edd Robinson 311d36d776 refactor: include capacity in Read Buffer chunk size 2021-08-13 11:57:46 +01:00
Edd Robinson 03592aaf94 refactor: ignore bitmap size from required bytes
Bitmaps are a performance optimisation; they're not required for the RLE compression and so it seems reasonable to ignore them when assessing the compression performance of RLE.
2021-08-13 11:57:46 +01:00
Edd Robinson fa8da19c45 refactor: expose enc size API into column 2021-08-13 11:57:46 +01:00
Edd Robinson e0bce4c2f2 refactor: always use same Arrow sizing call 2021-08-13 11:57:46 +01:00
Edd Robinson e78aebdf19
refactor: update read_buffer/src/column/encoding/scalar/fixed.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-08-12 15:57:01 +01:00
Edd Robinson 0e8b0edfc9 feat: add buffer-based sizing for numerical encodings 2021-08-12 15:05:47 +01:00
Edd Robinson 11349fa30d feat: add allocated size to bool 2021-08-12 15:05:47 +01:00
Edd Robinson b4f8e854f6 feat: size rle string encoding by allocated buffers 2021-08-12 15:05:47 +01:00
Edd Robinson 78d3749af5 feat: size dictionary encoding by allocated space 2021-08-12 15:05:47 +01:00
Dom 3de6b44e23
build: use new rustdoc lint name (#2261)
* fix: nocache feature code rot

The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.

* build: use updated broken_intra_doc_links name

The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links

https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
kodiakhq[bot] 304901bf40
Merge branch 'main' into er/refactor/logs 2021-08-10 21:31:49 +00:00
Andrew Lamb 8626e9980b
docs: Add/update doccomments in the read_buffer (#2245)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-10 21:26:02 +00:00
Edd Robinson 5d5ed7d0db refactor: remove logging 2021-08-10 22:16:01 +01:00
Edd Robinson f8870968b9 refactor: reduce logging when creating RUB chunk 2021-08-10 22:11:10 +01:00
Andrew Lamb 126598a2e8
fix(read_buffer): Improve statistics update to handle nulls and prevent `panic`s (#2246)
* fix(read_buffer): Improve statistics update to handle nulls

* fix: clippy

* refactor: only compile test helpers with cfg(test)

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-10 16:58:20 +00:00
kodiakhq[bot] 0297aae17e
Merge branch 'main' into cn/1.54 2021-07-30 17:01:37 +00:00
Andrew Lamb 248ae08343
fix(read_buffer): Avoid panic when creating stats for entirely null columns (#2159) 2021-07-30 14:59:18 +00:00
Carol (Nichols || Goulding) 9d15798288 fix: Address or allow Clippy warnings new with Rust 1.54 2021-07-30 09:59:59 -04:00
Carol (Nichols || Goulding) 11b7755325 refactor: Remove first/last write times from RUB chunks 2021-07-28 11:22:22 -04:00
Andrew Lamb 5fb3e00f2a
fix: Properly record total_count and null_count in statistics (#2103)
* fix: Properly record total_count and null_count in statistics

* fix: fix statistics calculation in mutable_buffer

* refactor: expose null counts in read_buffer

* refactor: expose null_count in parquet_file

* fix: update server crate tests

* fix: update query_tests tests

* docs: tweak comments

* refactor: Use storage_stats rather than adding `null_count`

* refactor: rename test data field for clarity

* fix: fixup merge conflicts

* refactor: rename initial_non_null_count to initial_total_count

* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Carol (Nichols || Goulding) 05782eb980 refactor: Move first/last write times up to read buffer Chunk rather than MetaData 2021-07-22 12:27:46 -04:00
Carol (Nichols || Goulding) 37f24ebfc7 feat: Record first/last write times for creation of read_buffer::Chunk 2021-07-22 11:35:23 -04:00