Edd Robinson
03592aaf94
refactor: ignore bitmap size from required bytes
...
Bitmaps are a performance optimisation; they're not required for the RLE compression and so it seems reasonable to ignore them when assessing the compression performance of RLE.
2021-08-13 11:57:46 +01:00
Edd Robinson
fa8da19c45
refactor: expose enc size API into column
2021-08-13 11:57:46 +01:00
Edd Robinson
e0bce4c2f2
refactor: always use same Arrow sizing call
2021-08-13 11:57:46 +01:00
Edd Robinson
e78aebdf19
refactor: update read_buffer/src/column/encoding/scalar/fixed.rs
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-08-12 15:57:01 +01:00
Edd Robinson
0e8b0edfc9
feat: add buffer-based sizing for numerical encodings
2021-08-12 15:05:47 +01:00
Edd Robinson
11349fa30d
feat: add allocated size to bool
2021-08-12 15:05:47 +01:00
Edd Robinson
b4f8e854f6
feat: size rle string encoding by allocated buffers
2021-08-12 15:05:47 +01:00
Edd Robinson
78d3749af5
feat: size dictionary encoding by allocated space
2021-08-12 15:05:47 +01:00
Dom
3de6b44e23
build: use new rustdoc lint name ( #2261 )
...
* fix: nocache feature code rot
The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.
* build: use updated broken_intra_doc_links name
The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links
https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
kodiakhq[bot]
304901bf40
Merge branch 'main' into er/refactor/logs
2021-08-10 21:31:49 +00:00
Andrew Lamb
8626e9980b
docs: Add/update doccomments in the read_buffer ( #2245 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-10 21:26:02 +00:00
Edd Robinson
5d5ed7d0db
refactor: remove logging
2021-08-10 22:16:01 +01:00
Edd Robinson
f8870968b9
refactor: reduce logging when creating RUB chunk
2021-08-10 22:11:10 +01:00
Andrew Lamb
126598a2e8
fix(read_buffer): Improve statistics update to handle nulls and prevent `panic`s ( #2246 )
...
* fix(read_buffer): Improve statistics update to handle nulls
* fix: clippy
* refactor: only compile test helpers with cfg(test)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-10 16:58:20 +00:00
kodiakhq[bot]
0297aae17e
Merge branch 'main' into cn/1.54
2021-07-30 17:01:37 +00:00
Andrew Lamb
248ae08343
fix(read_buffer): Avoid panic when creating stats for entirely null columns ( #2159 )
2021-07-30 14:59:18 +00:00
Carol (Nichols || Goulding)
9d15798288
fix: Address or allow Clippy warnings new with Rust 1.54
2021-07-30 09:59:59 -04:00
Carol (Nichols || Goulding)
11b7755325
refactor: Remove first/last write times from RUB chunks
2021-07-28 11:22:22 -04:00
Andrew Lamb
5fb3e00f2a
fix: Properly record total_count and null_count in statistics ( #2103 )
...
* fix: Properly record total_count and null_count in statistics
* fix: fix statistics calculation in mutable_buffer
* refactor: expose null counts in read_buffer
* refactor: expose null_count in parquet_file
* fix: update server crate tests
* fix: update query_tests tests
* docs: tweak comments
* refactor: Use storage_stats rather than adding `null_count`
* refactor: rename test data field for clarity
* fix: fixup merge conflicts
* refactor: rename initial_non_null_count to initial_total_count
* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Carol (Nichols || Goulding)
05782eb980
refactor: Move first/last write times up to read buffer Chunk rather than MetaData
2021-07-22 12:27:46 -04:00
Carol (Nichols || Goulding)
37f24ebfc7
feat: Record first/last write times for creation of read_buffer::Chunk
2021-07-22 11:35:23 -04:00
Carol (Nichols || Goulding)
4e6b79534b
feat: Require passing first/last write times for creation of Table
2021-07-22 11:35:23 -04:00
Carol (Nichols || Goulding)
b7bedeaaf3
feat: Require passing first/last write times for creation of Table MetaData
2021-07-22 11:35:23 -04:00
Carol (Nichols || Goulding)
8d1d877196
feat: Record first/last write times for RUB chunks
2021-07-22 11:35:22 -04:00
Carol (Nichols || Goulding)
16b07e5b31
refactor: Always use Table::with_row_group to ensure Tables are never empty
...
Remove Table::new that created an empty table.
2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding)
6feea3b2d5
feat: Require at least one RecordBatch to create a read_buffer::Chunk::new
...
In the signature only for the moment.
2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding)
bbb4462264
refactor: Extract a function for the RecordBatch to RowGroup transformation with logging
...
So that we can call it from RBChunk::new too.
2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding)
0a724878e6
refactor: Organize uses
2021-07-22 11:15:18 -04:00
Andrew Lamb
4da8a16c18
chore: update to arrow 5.0 and master datafusion ( #2049 )
...
* chore: update to arrow 5.0 and master datafusion
* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Edd Robinson
54ad69ed86
fix: ensure correct table meta size used
2021-07-16 10:48:45 -04:00
Carol (Nichols || Goulding)
f3175ed291
test: use of different size values
2021-07-16 09:47:56 -04:00
Carol (Nichols || Goulding)
abe2fe7262
test: MetaData new with a row group vs default then update_with should have the same size
2021-07-16 09:47:56 -04:00
Andrew Lamb
3fd6430fb6
fix: rename `estimated_bytes` to `memory_bytes` and expose `object_store_bytes` in ChunkSummary and system.chunks ( #2017 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-15 16:00:24 +00:00
kodiakhq[bot]
833debd5b5
Merge branch 'main' into cn/exploration
2021-07-14 17:30:55 +00:00
Raphael Taylor-Davies
1d00fa2fd8
refactor: track memory metrics in catalog ( #1995 )
...
* refactor: track memory metrics in catalog
* chore: update comment
2021-07-14 16:23:00 +00:00
Carol (Nichols || Goulding)
8070065e2f
fix: Change RUB chunk table_summaries to table_summary
...
Because chunks now have only one table.
Connects to #1718 , #1613 , #1295
2021-07-14 11:18:02 -04:00
Andrew Lamb
97c727a2c2
fix: update read_buffer tests
2021-07-13 15:44:57 -04:00
Marco Neumann
2e391deb34
chore: update croaring to 0.5.0
...
Upstreame changelog:
- CRoaring updated to 0.3.1
- `-march=native` is not a default for croaring-sys anymore
- Impl Default for `Bitmap` and `Treemap`
2021-07-13 15:15:41 +02:00
Andrew Lamb
d35b74c226
fix: Fix doc build warnings ( #1945 )
...
* fix: Fix doc build warnings
* refactor: add deny bare_urls to crates
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
Edd Robinson
f811bf1e5e
refactor: log compaction activity
2021-07-08 12:48:41 +01:00
Andrew Lamb
7602bde850
chore: Update datafusion deps ( #1799 )
...
* chore: Update datafusion deps + rework code
* refactor: remove workaround as it has been contributed upstream
* fix: Update query/src/exec/split.rs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Andrew Lamb
e6d995cbd8
chore: Update to Rust 1.53.0 ( #1922 )
...
* chore: Update to Rust 1.53.0
* fix: Update to latest clippy standards
* fix: bad refactor
* fix: Update escaping
* test: update test output
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Edd Robinson
2ec9151b32
Merge branch 'main' into er/fix/read_buffer/predicate
2021-07-06 13:35:04 +01:00
Raphael Taylor-Davies
b4534883fe
refactor: remove table name from upsert_table ( #1882 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-02 15:22:41 +00:00
Edd Robinson
8fc07cf4f0
fix: correctly evaluate exprs matching disjoint rows
2021-07-01 16:05:09 +01:00
Edd Robinson
2e430ac7f0
refactor: remove table name from read_filter schema
2021-06-30 09:50:53 +01:00
Edd Robinson
62f274cc1b
refactor: remove table name from column_values
2021-06-30 09:46:54 +01:00
Edd Robinson
5737c9d962
refactor: remove table name from column_names
2021-06-30 09:43:41 +01:00
Edd Robinson
86e2fe4138
refactor: satisfies predicate
2021-06-29 11:58:28 +01:00
Edd Robinson
ed98812c2a
refactor: logging
2021-06-25 17:37:04 +01:00
Raphael Taylor-Davies
84e63bb1b9
feat: temporary log information on record batch data ( #1807 )
...
* feat: temporary log information on record batch data
* chore: add buffer size
2021-06-25 14:46:51 +00:00
Edd Robinson
7e3df17896
test: update benchmarks
2021-06-21 15:29:23 +01:00
Edd Robinson
2ff80162d3
refactor: one table per chunk
2021-06-21 15:08:38 +01:00
Edd Robinson
15416ca223
refactor: enable empty table init
2021-06-21 15:08:38 +01:00
Andrew Lamb
ec43a87909
chore: Update itertools deps ( #1750 )
2021-06-17 17:56:44 +00:00
Edd Robinson
ff19beb0ad
refactor: export rb chunk as RBChunk
2021-06-11 18:33:10 +01:00
Raphael Taylor-Davies
1e7ef193a6
refactor: use field metadata to store influx types ( #1642 )
...
* refactor: use field metadata to store influx types
make SchemaBuilder non-consuming
* chore: remove unused variants
* chore: fix lints
2021-06-07 13:26:39 +00:00
Edd Robinson
418cc4cf0e
refactor: update read_buffer/src/column/encoding/scalar/transcoders.rs
...
Co-authored-by: Dom <dom@itsallbroken.com>
2021-06-03 14:02:16 +01:00
Edd Robinson
22c7592e3b
refactor: DRY RLE check
2021-06-03 12:32:40 +01:00
Edd Robinson
9a45c0d05b
feat: implement float byte trimming for arrow array
2021-06-03 12:32:40 +01:00
Edd Robinson
5d02a71e6f
feat: implement byte trimming on float slice
2021-06-03 12:32:40 +01:00
Edd Robinson
32e5f8c715
feat: add float byte trimmer encoding
2021-06-03 12:32:40 +01:00
Edd Robinson
728476f2e1
refactor: add encoding name to float encodings
2021-06-03 12:32:40 +01:00
Edd Robinson
fa729fd6b0
refactor: address PR feedback
2021-06-02 11:21:10 +01:00
Edd Robinson
a5b554d2c3
feat: add RLE support to integer encodings
2021-06-02 10:57:17 +01:00
Edd Robinson
71598d9b3e
refactor: move rle heuristics to rle module
2021-06-02 10:57:17 +01:00
Dom
aca00a505f
refactor: avoid copying encoding name for stats
...
Swaps the encoding name String type for a Cow in the column Statistics,
avoiding having to copy the encoding name where it is a static string
already.
2021-06-01 11:08:07 +01:00
Andrew Lamb
00e735ef0d
chore: remove unused dependencies ( #1583 )
2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies
db432de137
feat: add distinct count to StatValues ( #1568 )
2021-05-28 17:41:34 +00:00
Edd Robinson
e94be15296
refactor: update read_buffer/src/column/encoding/scalar/fixed.rs
2021-05-27 21:18:10 +01:00
Edd Robinson
26fe4167f7
refactor: erase some types
2021-05-27 14:57:40 +01:00
Edd Robinson
bba387d6ff
refactor: fix benchmarks
2021-05-27 14:35:34 +01:00
Edd Robinson
7ab27c4468
refactor: get build
2021-05-27 14:35:34 +01:00
Edd Robinson
04513e737d
refactor: define encodings wrt to scalar encoding trait
2021-05-27 14:35:34 +01:00
Edd Robinson
22f8a8a4a1
refactor: define scalar encoding in terms of trait
2021-05-27 14:35:34 +01:00
Edd Robinson
c84d50447c
feat: define a Transcoder trait
2021-05-27 14:35:34 +01:00
Edd Robinson
a81ada6140
feat: add transcoder trait
2021-05-27 14:35:34 +01:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00
kodiakhq[bot]
db96286ed7
Merge branch 'main' into er/refactor/scalar_comp
2021-05-24 17:02:14 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson
eace6c9201
fix: ensure scalars compare correctly
2021-05-24 16:19:28 +01:00
Nga Tran
784ef88fcd
chore: merge main to branch and add more tests that expose a wrong result bug on unsigned int
2021-05-21 12:38:06 -04:00
Edd Robinson
a65c729b01
fix: support converse binary expressions
2021-05-21 15:41:52 +01:00
Edd Robinson
d5f02cb6c5
refactor: address PR feedback
2021-05-21 09:40:26 +01:00
Edd Robinson
d57e3ae73e
refactor: move scalar encodings
2021-05-20 22:58:30 +01:00
Edd Robinson
0ec2499f60
refactor: teach scalar RLE to return different type
2021-05-20 22:50:44 +01:00
Nga Tran
e44a3a87db
feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet
2021-05-20 16:56:15 -04:00
Edd Robinson
4cb76e367b
refactor: fix change to Chunk API
2021-05-20 11:11:18 +01:00
Edd Robinson
663a38862d
refactor: address PR feedback
2021-05-20 10:49:49 +01:00
Edd Robinson
76caef89b1
refactor: apply suggestions from code review
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-20 10:49:49 +01:00
Edd Robinson
c901fe1023
perf: improve values_as_dictionary with predicates
2021-05-20 10:49:49 +01:00
Edd Robinson
723ff2553b
feat: teach read_filter to return dictionaries
2021-05-20 10:49:49 +01:00
Edd Robinson
3de6f3f8bd
feat: teach string encoding to production Dictionary values
2021-05-20 10:49:49 +01:00
Edd Robinson
634ceb886b
feat: add Dictionary Values type
2021-05-20 10:49:49 +01:00
Edd Robinson
b7b87c1c96
test: add read_filter benchmark
2021-05-20 10:49:49 +01:00
Edd Robinson
4e766d7085
refactor: reorganise benchmarks
2021-05-20 10:49:49 +01:00
Edd Robinson
c8e2c9224e
chore: rename benchmark
2021-05-20 10:49:49 +01:00
Raphael Taylor-Davies
37880ee89a
refactor: store chunk IDs only in catalog ( #1521 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Edd Robinson
2963d63b5e
feat: implement byte trimming on nullable encodings
2021-05-17 14:32:55 +01:00
Edd Robinson
6a72274517
feat: extend implementations to more Arrow arrays
2021-05-17 14:32:55 +01:00