Commit Graph

409 Commits (8e9dd227534f57e607c7d6b89fad055bde6e02fe)

Author SHA1 Message Date
Edd Robinson ed98812c2a refactor: logging 2021-06-25 17:37:04 +01:00
Raphael Taylor-Davies 84e63bb1b9
feat: temporary log information on record batch data (#1807)
* feat: temporary log information on record batch data

* chore: add buffer size
2021-06-25 14:46:51 +00:00
Edd Robinson 7e3df17896 test: update benchmarks 2021-06-21 15:29:23 +01:00
Edd Robinson 2ff80162d3 refactor: one table per chunk 2021-06-21 15:08:38 +01:00
Edd Robinson 15416ca223 refactor: enable empty table init 2021-06-21 15:08:38 +01:00
Andrew Lamb ec43a87909
chore: Update itertools deps (#1750) 2021-06-17 17:56:44 +00:00
Edd Robinson ff19beb0ad refactor: export rb chunk as RBChunk 2021-06-11 18:33:10 +01:00
Raphael Taylor-Davies 1e7ef193a6
refactor: use field metadata to store influx types (#1642)
* refactor: use field metadata to store influx types

make SchemaBuilder non-consuming

* chore: remove unused variants

* chore: fix lints
2021-06-07 13:26:39 +00:00
Edd Robinson 418cc4cf0e
refactor: update read_buffer/src/column/encoding/scalar/transcoders.rs
Co-authored-by: Dom <dom@itsallbroken.com>
2021-06-03 14:02:16 +01:00
Edd Robinson 22c7592e3b refactor: DRY RLE check 2021-06-03 12:32:40 +01:00
Edd Robinson 9a45c0d05b feat: implement float byte trimming for arrow array 2021-06-03 12:32:40 +01:00
Edd Robinson 5d02a71e6f feat: implement byte trimming on float slice 2021-06-03 12:32:40 +01:00
Edd Robinson 32e5f8c715 feat: add float byte trimmer encoding 2021-06-03 12:32:40 +01:00
Edd Robinson 728476f2e1 refactor: add encoding name to float encodings 2021-06-03 12:32:40 +01:00
Edd Robinson fa729fd6b0 refactor: address PR feedback 2021-06-02 11:21:10 +01:00
Edd Robinson a5b554d2c3 feat: add RLE support to integer encodings 2021-06-02 10:57:17 +01:00
Edd Robinson 71598d9b3e refactor: move rle heuristics to rle module 2021-06-02 10:57:17 +01:00
Dom aca00a505f refactor: avoid copying encoding name for stats
Swaps the encoding name String type for a Cow in the column Statistics,
avoiding having to copy the encoding name where it is a static string
already.
2021-06-01 11:08:07 +01:00
Andrew Lamb 00e735ef0d
chore: remove unused dependencies (#1583) 2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies db432de137
feat: add distinct count to StatValues (#1568) 2021-05-28 17:41:34 +00:00
Edd Robinson e94be15296
refactor: update read_buffer/src/column/encoding/scalar/fixed.rs 2021-05-27 21:18:10 +01:00
Edd Robinson 26fe4167f7 refactor: erase some types 2021-05-27 14:57:40 +01:00
Edd Robinson bba387d6ff refactor: fix benchmarks 2021-05-27 14:35:34 +01:00
Edd Robinson 7ab27c4468 refactor: get build 2021-05-27 14:35:34 +01:00
Edd Robinson 04513e737d refactor: define encodings wrt to scalar encoding trait 2021-05-27 14:35:34 +01:00
Edd Robinson 22f8a8a4a1 refactor: define scalar encoding in terms of trait 2021-05-27 14:35:34 +01:00
Edd Robinson c84d50447c feat: define a Transcoder trait 2021-05-27 14:35:34 +01:00
Edd Robinson a81ada6140 feat: add transcoder trait 2021-05-27 14:35:34 +01:00
Raphael Taylor-Davies 4fcc04e6c9
chore: enable arrow prettyprint feature (#1566) 2021-05-27 10:28:14 +00:00
kodiakhq[bot] db96286ed7
Merge branch 'main' into er/refactor/scalar_comp 2021-05-24 17:02:14 +00:00
Andrew Lamb 14ba25f86d
chore: Update datafusion and use released version of arrow crates (#1546)
* chore: Update datafusion and use released version of arrow crate

* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson eace6c9201 fix: ensure scalars compare correctly 2021-05-24 16:19:28 +01:00
Nga Tran 784ef88fcd chore: merge main to branch and add more tests that expose a wrong result bug on unsigned int 2021-05-21 12:38:06 -04:00
Edd Robinson a65c729b01 fix: support converse binary expressions 2021-05-21 15:41:52 +01:00
Edd Robinson d5f02cb6c5 refactor: address PR feedback 2021-05-21 09:40:26 +01:00
Edd Robinson d57e3ae73e refactor: move scalar encodings 2021-05-20 22:58:30 +01:00
Edd Robinson 0ec2499f60 refactor: teach scalar RLE to return different type 2021-05-20 22:50:44 +01:00
Nga Tran e44a3a87db feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet 2021-05-20 16:56:15 -04:00
Edd Robinson 4cb76e367b refactor: fix change to Chunk API 2021-05-20 11:11:18 +01:00
Edd Robinson 663a38862d refactor: address PR feedback 2021-05-20 10:49:49 +01:00
Edd Robinson 76caef89b1 refactor: apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-20 10:49:49 +01:00
Edd Robinson c901fe1023 perf: improve values_as_dictionary with predicates 2021-05-20 10:49:49 +01:00
Edd Robinson 723ff2553b feat: teach read_filter to return dictionaries 2021-05-20 10:49:49 +01:00
Edd Robinson 3de6f3f8bd feat: teach string encoding to production Dictionary values 2021-05-20 10:49:49 +01:00
Edd Robinson 634ceb886b feat: add Dictionary Values type 2021-05-20 10:49:49 +01:00
Edd Robinson b7b87c1c96 test: add read_filter benchmark 2021-05-20 10:49:49 +01:00
Edd Robinson 4e766d7085 refactor: reorganise benchmarks 2021-05-20 10:49:49 +01:00
Edd Robinson c8e2c9224e chore: rename benchmark 2021-05-20 10:49:49 +01:00
Raphael Taylor-Davies 37880ee89a
refactor: store chunk IDs only in catalog (#1521)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Edd Robinson 2963d63b5e feat: implement byte trimming on nullable encodings 2021-05-17 14:32:55 +01:00
Edd Robinson 6a72274517 feat: extend implementations to more Arrow arrays 2021-05-17 14:32:55 +01:00
Edd Robinson 2b98bca9ca test: allow from slice to be testable 2021-05-17 14:32:55 +01:00
Edd Robinson b7ea53f5db refactor: remove unnecessary from imps 2021-05-17 14:32:55 +01:00
Andrew Lamb 07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs (#1500) 2021-05-15 10:18:01 +00:00
Raphael Taylor-Davies f9178dbb5f
feat: push metrics into catalog (#1488)
* feat: push metrics into catalog

* chore: minor cleanup

* fix: include db labels in chunk metric domains

* chore: fmt

* fix: don't allow dropping moving chunks

* chore: further tweaks

* chore: review feedback

* feat: use new_unregistered() for metric instruments instead of default

* chore: use &[KeyValue] instead of &Vec<KeyValue>

* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Dom db6c7728c7 refactor: use 10% target reduction for RLE
Comments say 10% but const was 30% - a 10% computed size reduction
sounds sensible!
2021-05-14 15:08:54 +01:00
Dom 874d7a1118 test: run rle_rows test
The rle_rows test was missing a #[test] annotation preventing it from
running.
2021-05-14 14:41:17 +01:00
Edd Robinson 0d21d9e2e0 refactor: implement from_iter, reduce code! 2021-05-14 13:32:02 +01:00
Edd Robinson ac4fa1e527 refactor: update read_buffer/src/column/encoding/scalar/rle.rs
Co-authored-by: Dom <dom@itsallbroken.com>

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-05-14 13:24:55 +01:00
Edd Robinson 1fa08d0de5 test: add test for float encoding rules 2021-05-14 13:24:53 +01:00
Edd Robinson 1ac949e7ea feat: implement predicate pushdown on RLE 2021-05-14 13:23:42 +01:00
Edd Robinson 0cf445991e refactor: all read buffer tests passing 2021-05-14 13:14:12 +01:00
Edd Robinson 7525f6e9e3 feat: teach read buffer to create RLE float columns 2021-05-14 13:14:10 +01:00
Edd Robinson 9a666fac00 feat: implement RLE methods for materialising 2021-05-14 13:05:02 +01:00
Edd Robinson c55dce3af5 feat: implement stat methods 2021-05-14 13:05:02 +01:00
Edd Robinson 958219d63e feat: skeleton scalar RLE 2021-05-14 13:05:02 +01:00
Edd Robinson 91fda41f8e refactor: pdate read_buffer/src/column/boolean.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-05-14 12:11:54 +01:00
Edd Robinson d80e71ad86 feat: add new metric to track raw size 2021-05-14 10:34:54 +01:00
Edd Robinson 51c9c15026 refactor: include raw size in log message 2021-05-14 09:42:24 +01:00
Edd Robinson 966093deec feat: expose size_raw via ReadBuffer API 2021-05-14 09:42:24 +01:00
Edd Robinson 984f505267 feat: implement raw column size on bool columns 2021-05-14 09:42:24 +01:00
Edd Robinson 1a20f3fb4a feat: implement raw column size on float columns 2021-05-14 09:42:24 +01:00
Edd Robinson 301df03e72 feat: implement raw column size on integer columns 2021-05-14 09:42:24 +01:00
Edd Robinson 850db3f6c2 feat: implement raw size on string columns 2021-05-14 09:42:22 +01:00
Edd Robinson 1416097a35
Merge branch 'main' into er/feat/read_buffer/num_rle 2021-05-11 23:30:55 +01:00
Edd Robinson aa83669740 refactor: move encodings to scalar module 2021-05-11 22:49:20 +01:00
Edd Robinson 482e4dab86 refactor: shuffle string encodings 2021-05-11 22:47:42 +01:00
Edd Robinson f86e0641fd refactor: clarify benchmark 2021-05-11 22:47:42 +01:00
Edd Robinson f5fe270e43 refactor: move benchmark 2021-05-11 22:47:36 +01:00
Edd Robinson 696e4e0cfd fix: ensure metrics not overwriting 2021-05-11 20:57:31 +01:00
Raphael Taylor-Davies d1da954fe4
feat: don't store encoded strings twice in RLE dictionaries (#1469) 2021-05-11 15:22:25 +00:00
Edd Robinson 32abe2e777 feat: wire up stats to metrics 2021-05-11 13:38:32 +01:00
Edd Robinson c4987028fb feat: expose all column stats 2021-05-11 13:00:52 +01:00
Edd Robinson 88ed58aa8a feat: column statistics for int/float 2021-05-11 13:00:52 +01:00
Edd Robinson ef2eda04ef feat: add string encoder statistics 2021-05-11 13:00:52 +01:00
Edd Robinson 3622a92c8b feat: wire in rb column metrics 2021-05-11 13:00:52 +01:00
Marco Neumann 795f5bfcb7 refactor: make `StatValues::{min,max}` optional + handle NaNs
This will allow us to:

- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
  anymore)
2021-05-10 17:12:25 +02:00
Edd Robinson 4a414fc8fb fix: don't blow up on all null columns 2021-05-07 17:31:18 +01:00
Andrew Lamb b5ea71f45f
feat: Expose the storage usage for each column in system.chunk_columns (#1441)
* feat: Expose the storage usage for each column in system.chunk_columns

* fix: fixup logical conflicts

* refactor: move coalsce logic into the read buffer

* fix: Update system_tables to not use coalese

* fix: Improve comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-07 12:36:49 +00:00
Andrew Lamb 884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns (#1415)
* feat: add column_type and influxdb_column_type, remove row_count from system.columns

* fix: update tests

* fix: more test update

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt

* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Raphael Taylor-Davies ca1c698fd0
chore: update hashbrown (#1430)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:32:46 +00:00
Raphael Taylor-Davies 411cf134e9
refactor: explode arrow_deps (#1425)
* refactor: explode arrow_deps

* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Edd Robinson b4b048127d refactor: add column count to log line 2021-05-05 11:08:15 +01:00
Andrew Lamb 0788892413
feat: add row_count to system.chunks and Chunk managment API (#1373)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-29 13:53:25 +00:00
Edd Robinson 96e1289c94 refactor: log time to create row group 2021-04-28 15:08:19 +00:00
Edd Robinson c3b41649ee refactor: use space saving compression 2021-04-28 15:08:19 +00:00
Edd Robinson 5e3d43d62f fix: report row group rows correctly 2021-04-28 15:08:19 +00:00
Marco Neumann eddc9319ff docs: deny broken intradoc links 2021-04-27 13:22:28 +02:00
Raphael Taylor-Davies 20117de078
feat: string dictionary encoding (#1220) (#1262)
* feat: string dictionary encoding (#1220)

* chore: review comments

* chore: fix lint

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-27 09:36:58 +00:00
Dom 1826d86938 chore: fix observability dependency key
The observability_deps crate name was erroneously wrapped in double
quotes.
2021-04-26 12:22:21 +01:00
Edd Robinson faec98eab9 refactor: remove time column from row group 2021-04-26 09:51:06 +00:00
Edd Robinson 15bde2f8fa refactor: remove time field from Table meta 2021-04-26 09:51:06 +00:00
Edd Robinson 1f0e760f2f refactor: use bool::then 2021-04-22 21:59:41 +00:00
Edd Robinson 2784f89e6e refactor: sigh 2021-04-20 17:30:50 +00:00
Edd Robinson d55bf73860 refactor: satisfy new clippy lints 2021-04-20 17:30:50 +00:00
Edd Robinson 0114afcdfa refactor: quieten logs for new row group 2021-04-19 16:08:49 +00:00
Carol (Nichols || Goulding) 82c1d94ce1 refactor: Use Option.map where possible 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) 83250a93e6 refactor: Use vec macro 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) c9772db01b fix: Allow this upper case acronym; it could be confusing otherwise 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) 757933afc4 fix: use Self when possible 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) 716c3d41ab refactor: Use flatten rather than if let Some 2021-04-19 08:44:52 -04:00
Marco Neumann fd0da7e74a chore: upgrade arrow and Rust
See https://github.com/apache/arrow/pull/10082 for upstream PR.
2021-04-19 14:00:04 +02:00
Edd Robinson 4b706141de refactor: log new row groups added to RB 2021-04-19 10:25:57 +00:00
Edd Robinson 2b53f55fd5 refactor: implement Display on row group 2021-04-19 10:25:57 +00:00
Edd Robinson 98aa2a3543 refactor: implemenent Display on encodings 2021-04-19 10:25:57 +00:00
Edd Robinson a9d2ffcc6f refactor: impl Display on OwnedValue 2021-04-19 10:25:57 +00:00
Edd Robinson 22fbec194e refactor: impl Display on LogicalDataType 2021-04-19 10:25:57 +00:00
Edd Robinson d11a322e0c refactor: import Display 2021-04-19 10:25:57 +00:00
Andrew Lamb e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx (#1230)
* refactor: Create Arrow arrays using iterators

* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps

* feat: add support for timestamp array

* fix: update more tests

* fix: remove unecessary code

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Edd Robinson 025f760479 refactor: change sync::RwLock to parking_lot 2021-04-14 19:18:03 +00:00
Edd Robinson a3fc5e2474 refactor: change sync::RwLock to parking_lot 2021-04-14 19:18:03 +00:00
Edd Robinson 5bb34e9a97 refactor: use read_buffer column range for time range 2021-04-14 16:10:24 +00:00
Nga Tran 05bf28ce85 feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it 2021-04-13 18:23:55 -04:00
Edd Robinson be369689f2 refactor: fix benchmarks 2021-04-08 18:20:37 +00:00
Edd Robinson a34de76c49 refactor: wire read buffer tracker in 2021-04-08 18:20:37 +00:00
Edd Robinson d429cf9aeb refactor: tighten up Read Buffer API 2021-04-08 10:24:19 +00:00
Edd Robinson dae9f12593 refactor: remove deprecated API 2021-04-08 10:24:19 +00:00
Edd Robinson 721a784ce0 refactor: concrete type to keep streams happy 2021-04-08 10:19:11 +00:00
Edd Robinson 17853266ce refactor: add helper method for all names 2021-04-08 10:19:11 +00:00
Edd Robinson 9e1068b7de refactor: db to return usize 2021-04-07 13:27:41 +00:00
Edd Robinson cab7d1d1fe refactor: chunk size to return usize 2021-04-07 13:27:41 +00:00
Edd Robinson 7dd5924059 refactor: table size to return usize 2021-04-07 13:27:41 +00:00
Edd Robinson b867599fce refactor: row group size to return usize 2021-04-07 13:27:41 +00:00
Edd Robinson 40abe24123 refactor: encoding size to return usize 2021-04-07 13:27:41 +00:00
Edd Robinson ea2c882635 refactor: tidy up 2021-04-07 10:46:08 +00:00
Edd Robinson b524b4411e test: move test for table_summaries into chunk 2021-04-07 10:46:08 +00:00
Edd Robinson a662d5e180 refactor: expose column_values at chunk level 2021-04-07 10:46:08 +00:00
Edd Robinson 7bee168752 refactor: chunk upsert does not need &mut 2021-04-07 10:46:08 +00:00
Edd Robinson e8ff86279f refactor: move column_names to chunk level 2021-04-07 10:46:08 +00:00
Edd Robinson 03b72cc80d feat: add chunk predicate check 2021-04-07 10:46:08 +00:00
Edd Robinson c2e0c80f8c refactor: expose read_filter at chunk level 2021-04-07 10:46:08 +00:00
Edd Robinson bc3560af8c test: add coverage for has_table 2021-04-07 10:46:08 +00:00
Edd Robinson 0c32cb48c7 feat: add read_filter schema method 2021-04-07 10:46:08 +00:00
Edd Robinson d39fe8abf8 refactor: add ability to upsert table 2021-04-07 10:46:08 +00:00
Edd Robinson 2129c2a725 refactor: make some methods crate-visible 2021-04-07 10:46:08 +00:00
Edd Robinson 5eea06c9b3 refactor: allow empty chunks 2021-04-07 10:46:08 +00:00
Andrew Lamb b61875e0b2
feat: Add TableSummary calculation to ReadBuffer (#1092) 2021-03-31 16:26:37 +00:00
Andrew Lamb 6a48001d13
refactor: Manage storage directly in the Catalog (#1057)
* refactor: Manage mutable buffer chunks directly

* fix: do not use mutable_buffer for listing table names
2021-03-29 17:55:07 +00:00
Andrew Lamb 895e808754
chore: Upgrade arrow deps (#1046)
* chore: Upgrade dependencies

* chore: upgrade query for new interfaces

* chore: update read_buffer
2021-03-25 13:35:08 +00:00
Edd Robinson 585213e51f refactor: return error for wrong group columns 2021-03-23 14:41:06 +00:00