Edd Robinson
22f8a8a4a1
refactor: define scalar encoding in terms of trait
2021-05-27 14:35:34 +01:00
Edd Robinson
c84d50447c
feat: define a Transcoder trait
2021-05-27 14:35:34 +01:00
Edd Robinson
a81ada6140
feat: add transcoder trait
2021-05-27 14:35:34 +01:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00
kodiakhq[bot]
db96286ed7
Merge branch 'main' into er/refactor/scalar_comp
2021-05-24 17:02:14 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson
eace6c9201
fix: ensure scalars compare correctly
2021-05-24 16:19:28 +01:00
Nga Tran
784ef88fcd
chore: merge main to branch and add more tests that expose a wrong result bug on unsigned int
2021-05-21 12:38:06 -04:00
Edd Robinson
a65c729b01
fix: support converse binary expressions
2021-05-21 15:41:52 +01:00
Edd Robinson
d5f02cb6c5
refactor: address PR feedback
2021-05-21 09:40:26 +01:00
Edd Robinson
d57e3ae73e
refactor: move scalar encodings
2021-05-20 22:58:30 +01:00
Edd Robinson
0ec2499f60
refactor: teach scalar RLE to return different type
2021-05-20 22:50:44 +01:00
Nga Tran
e44a3a87db
feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet
2021-05-20 16:56:15 -04:00
Edd Robinson
4cb76e367b
refactor: fix change to Chunk API
2021-05-20 11:11:18 +01:00
Edd Robinson
663a38862d
refactor: address PR feedback
2021-05-20 10:49:49 +01:00
Edd Robinson
76caef89b1
refactor: apply suggestions from code review
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-20 10:49:49 +01:00
Edd Robinson
c901fe1023
perf: improve values_as_dictionary with predicates
2021-05-20 10:49:49 +01:00
Edd Robinson
723ff2553b
feat: teach read_filter to return dictionaries
2021-05-20 10:49:49 +01:00
Edd Robinson
3de6f3f8bd
feat: teach string encoding to production Dictionary values
2021-05-20 10:49:49 +01:00
Edd Robinson
634ceb886b
feat: add Dictionary Values type
2021-05-20 10:49:49 +01:00
Edd Robinson
b7b87c1c96
test: add read_filter benchmark
2021-05-20 10:49:49 +01:00
Edd Robinson
4e766d7085
refactor: reorganise benchmarks
2021-05-20 10:49:49 +01:00
Edd Robinson
c8e2c9224e
chore: rename benchmark
2021-05-20 10:49:49 +01:00
Raphael Taylor-Davies
37880ee89a
refactor: store chunk IDs only in catalog ( #1521 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Edd Robinson
2963d63b5e
feat: implement byte trimming on nullable encodings
2021-05-17 14:32:55 +01:00
Edd Robinson
6a72274517
feat: extend implementations to more Arrow arrays
2021-05-17 14:32:55 +01:00
Edd Robinson
2b98bca9ca
test: allow from slice to be testable
2021-05-17 14:32:55 +01:00
Edd Robinson
b7ea53f5db
refactor: remove unnecessary from imps
2021-05-17 14:32:55 +01:00
Andrew Lamb
07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs ( #1500 )
2021-05-15 10:18:01 +00:00
Raphael Taylor-Davies
f9178dbb5f
feat: push metrics into catalog ( #1488 )
...
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Dom
db6c7728c7
refactor: use 10% target reduction for RLE
...
Comments say 10% but const was 30% - a 10% computed size reduction
sounds sensible!
2021-05-14 15:08:54 +01:00
Dom
874d7a1118
test: run rle_rows test
...
The rle_rows test was missing a #[test] annotation preventing it from
running.
2021-05-14 14:41:17 +01:00
Edd Robinson
0d21d9e2e0
refactor: implement from_iter, reduce code!
2021-05-14 13:32:02 +01:00
Edd Robinson
ac4fa1e527
refactor: update read_buffer/src/column/encoding/scalar/rle.rs
...
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-05-14 13:24:55 +01:00
Edd Robinson
1fa08d0de5
test: add test for float encoding rules
2021-05-14 13:24:53 +01:00
Edd Robinson
1ac949e7ea
feat: implement predicate pushdown on RLE
2021-05-14 13:23:42 +01:00
Edd Robinson
0cf445991e
refactor: all read buffer tests passing
2021-05-14 13:14:12 +01:00
Edd Robinson
7525f6e9e3
feat: teach read buffer to create RLE float columns
2021-05-14 13:14:10 +01:00
Edd Robinson
9a666fac00
feat: implement RLE methods for materialising
2021-05-14 13:05:02 +01:00
Edd Robinson
c55dce3af5
feat: implement stat methods
2021-05-14 13:05:02 +01:00
Edd Robinson
958219d63e
feat: skeleton scalar RLE
2021-05-14 13:05:02 +01:00
Edd Robinson
91fda41f8e
refactor: pdate read_buffer/src/column/boolean.rs
...
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-05-14 12:11:54 +01:00
Edd Robinson
d80e71ad86
feat: add new metric to track raw size
2021-05-14 10:34:54 +01:00
Edd Robinson
51c9c15026
refactor: include raw size in log message
2021-05-14 09:42:24 +01:00
Edd Robinson
966093deec
feat: expose size_raw via ReadBuffer API
2021-05-14 09:42:24 +01:00
Edd Robinson
984f505267
feat: implement raw column size on bool columns
2021-05-14 09:42:24 +01:00
Edd Robinson
1a20f3fb4a
feat: implement raw column size on float columns
2021-05-14 09:42:24 +01:00
Edd Robinson
301df03e72
feat: implement raw column size on integer columns
2021-05-14 09:42:24 +01:00
Edd Robinson
850db3f6c2
feat: implement raw size on string columns
2021-05-14 09:42:22 +01:00
Edd Robinson
1416097a35
Merge branch 'main' into er/feat/read_buffer/num_rle
2021-05-11 23:30:55 +01:00
Edd Robinson
aa83669740
refactor: move encodings to scalar module
2021-05-11 22:49:20 +01:00
Edd Robinson
482e4dab86
refactor: shuffle string encodings
2021-05-11 22:47:42 +01:00
Edd Robinson
f86e0641fd
refactor: clarify benchmark
2021-05-11 22:47:42 +01:00
Edd Robinson
f5fe270e43
refactor: move benchmark
2021-05-11 22:47:36 +01:00
Edd Robinson
696e4e0cfd
fix: ensure metrics not overwriting
2021-05-11 20:57:31 +01:00
Raphael Taylor-Davies
d1da954fe4
feat: don't store encoded strings twice in RLE dictionaries ( #1469 )
2021-05-11 15:22:25 +00:00
Edd Robinson
32abe2e777
feat: wire up stats to metrics
2021-05-11 13:38:32 +01:00
Edd Robinson
c4987028fb
feat: expose all column stats
2021-05-11 13:00:52 +01:00
Edd Robinson
88ed58aa8a
feat: column statistics for int/float
2021-05-11 13:00:52 +01:00
Edd Robinson
ef2eda04ef
feat: add string encoder statistics
2021-05-11 13:00:52 +01:00
Edd Robinson
3622a92c8b
feat: wire in rb column metrics
2021-05-11 13:00:52 +01:00
Marco Neumann
795f5bfcb7
refactor: make `StatValues::{min,max}` optional + handle NaNs
...
This will allow us to:
- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
anymore)
2021-05-10 17:12:25 +02:00
Edd Robinson
4a414fc8fb
fix: don't blow up on all null columns
2021-05-07 17:31:18 +01:00
Andrew Lamb
b5ea71f45f
feat: Expose the storage usage for each column in system.chunk_columns ( #1441 )
...
* feat: Expose the storage usage for each column in system.chunk_columns
* fix: fixup logical conflicts
* refactor: move coalsce logic into the read buffer
* fix: Update system_tables to not use coalese
* fix: Improve comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-07 12:36:49 +00:00
Andrew Lamb
884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns ( #1415 )
...
* feat: add column_type and influxdb_column_type, remove row_count from system.columns
* fix: update tests
* fix: more test update
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Raphael Taylor-Davies
ca1c698fd0
chore: update hashbrown ( #1430 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:32:46 +00:00
Raphael Taylor-Davies
411cf134e9
refactor: explode arrow_deps ( #1425 )
...
* refactor: explode arrow_deps
* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Edd Robinson
b4b048127d
refactor: add column count to log line
2021-05-05 11:08:15 +01:00
Andrew Lamb
0788892413
feat: add row_count to system.chunks and Chunk managment API ( #1373 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-29 13:53:25 +00:00
Edd Robinson
96e1289c94
refactor: log time to create row group
2021-04-28 15:08:19 +00:00
Edd Robinson
c3b41649ee
refactor: use space saving compression
2021-04-28 15:08:19 +00:00
Edd Robinson
5e3d43d62f
fix: report row group rows correctly
2021-04-28 15:08:19 +00:00
Marco Neumann
eddc9319ff
docs: deny broken intradoc links
2021-04-27 13:22:28 +02:00
Raphael Taylor-Davies
20117de078
feat: string dictionary encoding ( #1220 ) ( #1262 )
...
* feat: string dictionary encoding (#1220 )
* chore: review comments
* chore: fix lint
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-27 09:36:58 +00:00
Dom
1826d86938
chore: fix observability dependency key
...
The observability_deps crate name was erroneously wrapped in double
quotes.
2021-04-26 12:22:21 +01:00
Edd Robinson
faec98eab9
refactor: remove time column from row group
2021-04-26 09:51:06 +00:00
Edd Robinson
15bde2f8fa
refactor: remove time field from Table meta
2021-04-26 09:51:06 +00:00
Edd Robinson
1f0e760f2f
refactor: use bool::then
2021-04-22 21:59:41 +00:00
Edd Robinson
2784f89e6e
refactor: sigh
2021-04-20 17:30:50 +00:00
Edd Robinson
d55bf73860
refactor: satisfy new clippy lints
2021-04-20 17:30:50 +00:00
Edd Robinson
0114afcdfa
refactor: quieten logs for new row group
2021-04-19 16:08:49 +00:00
Carol (Nichols || Goulding)
82c1d94ce1
refactor: Use Option.map where possible
2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding)
83250a93e6
refactor: Use vec macro
2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding)
c9772db01b
fix: Allow this upper case acronym; it could be confusing otherwise
2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding)
757933afc4
fix: use Self when possible
2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding)
716c3d41ab
refactor: Use flatten rather than if let Some
2021-04-19 08:44:52 -04:00
Marco Neumann
fd0da7e74a
chore: upgrade arrow and Rust
...
See https://github.com/apache/arrow/pull/10082 for upstream PR.
2021-04-19 14:00:04 +02:00
Edd Robinson
4b706141de
refactor: log new row groups added to RB
2021-04-19 10:25:57 +00:00
Edd Robinson
2b53f55fd5
refactor: implement Display on row group
2021-04-19 10:25:57 +00:00
Edd Robinson
98aa2a3543
refactor: implemenent Display on encodings
2021-04-19 10:25:57 +00:00
Edd Robinson
a9d2ffcc6f
refactor: impl Display on OwnedValue
2021-04-19 10:25:57 +00:00
Edd Robinson
22fbec194e
refactor: impl Display on LogicalDataType
2021-04-19 10:25:57 +00:00
Edd Robinson
d11a322e0c
refactor: import Display
2021-04-19 10:25:57 +00:00
Andrew Lamb
e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx ( #1230 )
...
* refactor: Create Arrow arrays using iterators
* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps
* feat: add support for timestamp array
* fix: update more tests
* fix: remove unecessary code
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Edd Robinson
025f760479
refactor: change sync::RwLock to parking_lot
2021-04-14 19:18:03 +00:00
Edd Robinson
a3fc5e2474
refactor: change sync::RwLock to parking_lot
2021-04-14 19:18:03 +00:00
Edd Robinson
5bb34e9a97
refactor: use read_buffer column range for time range
2021-04-14 16:10:24 +00:00
Nga Tran
05bf28ce85
feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it
2021-04-13 18:23:55 -04:00
Edd Robinson
be369689f2
refactor: fix benchmarks
2021-04-08 18:20:37 +00:00
Edd Robinson
a34de76c49
refactor: wire read buffer tracker in
2021-04-08 18:20:37 +00:00