Commit Graph

278 Commits (bf9a054961ad951764bd7e6672a8833ace6cef08)

Author SHA1 Message Date
Andrew Lamb 4800b36949 chore: Update IOx to a pre-release version of arrow and datafusion to test out performance improvement 2021-07-13 15:44:57 -04:00
Andrew Lamb 0164cabbf3
refactor: do not use DataFrame DataFusion API / stop optimizing twice (#1982)
* refactor: do not use DataFrame DataFusion API

* fix: update output to reflect not running optimizer twice

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 16:29:43 +00:00
Marco Neumann 2e391deb34 chore: update croaring to 0.5.0
Upstreame changelog:

- CRoaring updated to 0.3.1
- `-march=native` is not a default for croaring-sys anymore
- Impl Default for `Bitmap` and `Treemap`
2021-07-13 15:15:41 +02:00
Andrew Lamb d35b74c226
fix: Fix doc build warnings (#1945)
* fix: Fix doc build warnings

* refactor: add deny bare_urls to crates

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
kodiakhq[bot] f26f844ed2
Merge branch 'main' into ntran/use_sortkey 2021-07-12 18:12:47 +00:00
Carol (Nichols || Goulding) c681da1031 refactor: Define the TestChunk methods with macros 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 4e53a32928 refactor: Completely replace query::provider::overlap::TestChunk with query::test::TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 1698edcc39 refactor: Implement query::provider::overlap::TestChunk in terms of query::test::TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) dc0b97e121 refactor: Completely replace TestChunkMeta with TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 96f9485792 refactor: Move a with_no_stats method to be entirely defined on TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) b4c5a87088 refactor: Rename int field to i64 field to be more consistent 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 54f7ee8b8d refactor: Implement TestChunkMeta in terms of TestChunk
This is a temporary step to make sure TestChunk does everything
TestChunkMeta needs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) ee545ce90e test: Make _with_stats methods able to optionally take max/min
Not used yet, but will be when this is unified with query/src/pruning.rs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) b26aae1cb4 test: Add an arg to control whether to add a column summary at all
Always true for now, but there are some cases in query/src/pruning.rs
that don't add any column summaries that will use this with `false`.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 6cd75bc688 test: Optionally take stats in add_schema_to_table
This gets rid of a lookup and construction of default stats that aren't
necessary
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) e05ca7f98b fix: Change a method name that says null to not say null
The comment and implementation seem to indicate this is creating
non-null data.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 4406d8a219 test: Always initialize a TableSummary on TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 22d4040c81 test: Always initialize a Schema for TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 92cb5986f1 test: Initialize a schema on TestChunk to always exist 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 78f1c4fc80 test: Chunks can only have one table; no need to specify repeatedly
This lets us make the name required and always present on TestChunks,
and make the ID optional.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 15aac65c2c fix: Arrange use statements so rustfmt can manage their order 2021-07-12 09:59:11 -04:00
Nga Tran 7b7a60993d feat: consider time as a special key 2021-07-09 18:54:22 -04:00
Nga Tran 8f4463664c feat: add super_key function 2021-07-09 15:37:04 -04:00
Marco Neumann bc958e2ff0 refactor: use Arcs to pass schemas around 2021-07-09 09:45:12 +02:00
Marco Neumann 09e611deb7 refactor: lift query schema generation up to caller
Do no longer scan chunks during query planning to determine the schema
(except for the lifetime jobs where we have a good reason to do so).
Instead pass the schema down to from whoever is triggering the query.
For real SQL queries, we then just use the the table-wide schemas
introduced in #1913.

Apart from avoiding schema merges we now also don't crash any longer
when no chunks are left in the table (aka columns are present but all
rows are gone).

Fixes #1768.
Fixes #1884.
2021-07-09 09:24:21 +02:00
kodiakhq[bot] c8126784a8
Merge branch 'main' into ntran/avoid_sort_in_scan 2021-07-08 20:22:18 +00:00
Nga Tran 680394b50b refactor: run fmt 2021-07-08 16:21:42 -04:00
Nga Tran c5733ab4a7 refactor: remove redudant code 2021-07-08 16:11:42 -04:00
Nga Tran 6738cb272f refactor: remove duplicate test 2021-07-08 15:59:25 -04:00
Nga Tran da6249a4df fix: address reviewers' comments and also fixe a bug they discovered 2021-07-08 15:54:54 -04:00
Andrew Lamb 33bc85ad18
feat: Infrastructure for persistence (#1925)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 11:14:38 +00:00
Andrew Lamb 7602bde850
chore: Update datafusion deps (#1799)
* chore: Update datafusion deps + rework code

* refactor: remove workaround as it has been contributed upstream

* fix: Update query/src/exec/split.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Nga Tran d3c4f8c249 fix: store sort key correctly inthe schema. Update tests to reflect it 2021-07-07 15:55:23 -04:00
Andrew Lamb e6d995cbd8
chore: Update to Rust 1.53.0 (#1922)
* chore: Update to Rust 1.53.0

* fix: Update to latest clippy standards

* fix: bad refactor

* fix: Update escaping

* test: update test output

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Nga Tran 76789e5902 feat: store sotkey into the chunk schema of RUB 2021-07-06 17:00:35 -04:00
Marco Neumann b6185982f7 refactor: make `ProviderBuilder` a build-time-checked builder
It's safer and also avoids cloning / copying state around.
2021-07-06 18:20:05 +02:00
Marco Neumann 4172d7946c refactor: make `SchemaMerger` self-consuming
The error handling in `merge` was incomplete, aka it could leave the
merger in a half-modified state in case of an error. That's generally a
bad idea and can lead to ugly bugs. Also the "builder" pattern that is
used here usually consumes itself (and provides a clone impl), so it is
easier to reason about modifications. So this commit just changes it to
self-consuming builder.

A nice side effect of the new pattern is also that it is build-time
checked and does not contain a runtime assert any longer.
2021-07-06 18:20:05 +02:00
Andrew Lamb 56c8c8d428
feat: Use separate executor for queries and compactions/moves (#1870)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:47:50 +00:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
kodiakhq[bot] e03a1a1def
Merge branch 'main' into ntran/dedup_less_concat 2021-07-01 15:59:22 +00:00
Nga Tran d0afc7a176 refactor: clean up and add a missing else case 2021-07-01 11:00:30 -04:00
Nga Tran 5cf623201d fix: deduplicate the last batch before sending it downstream 2021-07-01 10:45:23 -04:00
Andrew Lamb 7235c7b965
refactor: Remove vestigial execution counters (#1865)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 14:08:06 +00:00
Nga Tran ba919726b6 test: unit tests 2021-06-30 15:01:31 -04:00
Nga Tran 2a06b93b00 chore: Merge branch 'main' into ntran/dedup_less_concat 2021-06-30 11:37:15 -04:00
Nga Tran 1dbdabd66e fix: 2 values are also considered to be the same if at least one of them is invalid 2021-06-30 10:52:21 -04:00
Raphael Taylor-Davies 62d3305923
feat: optimize the dictionaries in the output of deduplicate node (#1827) (#1832)
* feat: optimize dedup dictionaries (#1827)

* fix: handle sliced null bitmasks

* chore: review feedback
2021-06-30 09:30:16 +00:00
Nga Tran e6a4e0d709 refactor: make the code clearer for schema even though they are the same 2021-06-29 17:46:30 -04:00
Nga Tran a249b90952 refactor: refactor and add temp info for debugging 2021-06-29 16:35:50 -04:00
Nga Tran 4611e5d584 chore: merge main to branch 2021-06-29 15:39:23 -04:00