Raphael Taylor-Davies
0946ffe916
refactor: reuse IOxExecutionContext ( #2373 )
...
* refactor: reuse IOxExecutionContext
* fix: orphaned comment
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-23 15:47:15 +00:00
Dom
3de6b44e23
build: use new rustdoc lint name ( #2261 )
...
* fix: nocache feature code rot
The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.
* build: use updated broken_intra_doc_links name
The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links
https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
Andrew Lamb
559db4529d
refactor: Move DatabaseStore out of query crate ( #2219 )
...
* refactor: Move DatabaseStore out of query crate
* fix: doc links
2021-08-09 12:06:25 +00:00
Carol (Nichols || Goulding)
9d15798288
fix: Address or allow Clippy warnings new with Rust 1.54
2021-07-30 09:59:59 -04:00
Nga Tran
e8828c22e4
refactor: address review comments
2021-07-29 13:38:42 -04:00
Nga Tran
0d05ac3961
feat: add sort option while building scan plan to avoid extra sort during compaction
2021-07-28 17:32:01 -04:00
Andrew Lamb
e6cbd4d217
feat: Use statistics for count(*) queries ( #2038 )
...
* feat: Use statistics for count(*) queries
* docs: fix mangled comment
* refactor: rewrite to use fold
* refactor: use sort_by_cached_key
* fix: set null count properly
* fix: fmt + clippy
2021-07-28 19:39:41 +00:00
Andrew Lamb
5fb3e00f2a
fix: Properly record total_count and null_count in statistics ( #2103 )
...
* fix: Properly record total_count and null_count in statistics
* fix: fix statistics calculation in mutable_buffer
* refactor: expose null counts in read_buffer
* refactor: expose null_count in parquet_file
* fix: update server crate tests
* fix: update query_tests tests
* docs: tweak comments
* refactor: Use storage_stats rather than adding `null_count`
* refactor: rename test data field for clarity
* fix: fixup merge conflicts
* refactor: rename initial_non_null_count to initial_total_count
* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Andrew Lamb
01c79f1a1a
fix: Print all timestamps using RFC3339 format ( #2098 )
...
* fix: Use IOx pretty printer rather than arrow pretty printer
* chore: update tests in the query crate
* chore: update influxdb_iox tests
* chore: Update end to end tests
* chore: update query_tests
* chore: update mutable_buffer tests
* refactor: update parquet_file tests
* refactor: update db tests
* chore: update kafka integration test output
* fix: merge conflict
2021-07-22 19:04:52 +00:00
Nga Tran
11ba4b5f6a
fix: fix unit_test setting to have the desired results
2021-07-22 14:22:08 -04:00
Nga Tran
b2063fb29f
test: fix the stats and discover a bug in compaction/split/dedupplication
2021-07-21 17:40:48 -04:00
kodiakhq[bot]
18dd108ba6
Merge branch 'main' into ntran/dedup_compare_cols_order
2021-07-21 15:42:30 +00:00
Nga Tran
86add39175
refactor: address review comments
2021-07-21 11:41:21 -04:00
Nga Tran
d547c22e97
refactor: comments
2021-07-20 15:27:41 -04:00
Nga Tran
150e166813
refactor: fix comments
2021-07-20 15:16:24 -04:00
Nga Tran
fa6d216a85
refactor: cleanup
2021-07-20 15:11:02 -04:00
Nga Tran
b98888e8d6
feat: implement key_ranges function that uses new range identify algo
2021-07-20 14:58:54 -04:00
Andrew Lamb
2c20528c69
chore: use upstream versions of some workarounds ( #2057 )
...
* chore: use upstream versions of some workarounds
* docs: update docstring
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 08:53:46 +00:00
Nga Tran
1668420ded
feat: new algorth to compute key ranges for deduplicating data
2021-07-19 18:04:25 -04:00
Andrew Lamb
1c16988a51
chore: Update datafusion references ( #2056 )
2021-07-19 18:09:06 +00:00
Andrew Lamb
4da8a16c18
chore: update to arrow 5.0 and master datafusion ( #2049 )
...
* chore: update to arrow 5.0 and master datafusion
* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Raphael Taylor-Davies
5fc98c7c56
feat: add failure reporting to TaskTracker ( #2031 )
...
* feat: add failure reporting to TaskTracker
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-19 09:17:20 +00:00
Andrew Lamb
d00d56027b
docs: add comment to trigger build ( #2039 )
2021-07-16 17:53:55 +00:00
Nga Tran
cfe0bfa88b
refactor: address review comments and add useful log info to catch resort
2021-07-15 15:39:12 -04:00
Nga Tran
0b1f2b1fd0
chore: merge main to branch
2021-07-14 16:17:14 -04:00
Nga Tran
ef271d1e1c
test: make the tests clearer
2021-07-14 15:42:30 -04:00
Nga Tran
b4d86dcb7d
fix: make the order of sort key deterministic
2021-07-14 14:50:19 -04:00
Nga Tran
9ffaf863fa
refactor: cleanup
2021-07-14 14:30:04 -04:00
Nga Tran
552e3fb691
fix: Padd stats compute deterministic order of sort key and update tests that got changed by the use of sort key
2021-07-14 14:06:41 -04:00
Edd Robinson
46ac15a77e
refactor: increase compaction batch size
2021-07-14 17:19:11 +01:00
Nga Tran
8fd0df04f2
feat: continue buidling and using sort_key if available
2021-07-13 16:25:58 -04:00
Andrew Lamb
4800b36949
chore: Update IOx to a pre-release version of arrow and datafusion to test out performance improvement
2021-07-13 15:44:57 -04:00
Andrew Lamb
0164cabbf3
refactor: do not use DataFrame DataFusion API / stop optimizing twice ( #1982 )
...
* refactor: do not use DataFrame DataFusion API
* fix: update output to reflect not running optimizer twice
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 16:29:43 +00:00
Marco Neumann
2e391deb34
chore: update croaring to 0.5.0
...
Upstreame changelog:
- CRoaring updated to 0.3.1
- `-march=native` is not a default for croaring-sys anymore
- Impl Default for `Bitmap` and `Treemap`
2021-07-13 15:15:41 +02:00
Andrew Lamb
d35b74c226
fix: Fix doc build warnings ( #1945 )
...
* fix: Fix doc build warnings
* refactor: add deny bare_urls to crates
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
Nga Tran
5418a1fe6b
refactor: remove unused comments
2021-07-12 18:14:38 -04:00
Nga Tran
23895e6673
feat: Using sort_key to avoid resorts
2021-07-12 18:08:45 -04:00
kodiakhq[bot]
f26f844ed2
Merge branch 'main' into ntran/use_sortkey
2021-07-12 18:12:47 +00:00
Carol (Nichols || Goulding)
c681da1031
refactor: Define the TestChunk methods with macros
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
4e53a32928
refactor: Completely replace query::provider::overlap::TestChunk with query::test::TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
1698edcc39
refactor: Implement query::provider::overlap::TestChunk in terms of query::test::TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
dc0b97e121
refactor: Completely replace TestChunkMeta with TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
96f9485792
refactor: Move a with_no_stats method to be entirely defined on TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
b4c5a87088
refactor: Rename int field to i64 field to be more consistent
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
54f7ee8b8d
refactor: Implement TestChunkMeta in terms of TestChunk
...
This is a temporary step to make sure TestChunk does everything
TestChunkMeta needs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
ee545ce90e
test: Make _with_stats methods able to optionally take max/min
...
Not used yet, but will be when this is unified with query/src/pruning.rs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
b26aae1cb4
test: Add an arg to control whether to add a column summary at all
...
Always true for now, but there are some cases in query/src/pruning.rs
that don't add any column summaries that will use this with `false`.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
6cd75bc688
test: Optionally take stats in add_schema_to_table
...
This gets rid of a lookup and construction of default stats that aren't
necessary
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
e05ca7f98b
fix: Change a method name that says null to not say null
...
The comment and implementation seem to indicate this is creating
non-null data.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
4406d8a219
test: Always initialize a TableSummary on TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
22d4040c81
test: Always initialize a Schema for TestChunk
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
92cb5986f1
test: Initialize a schema on TestChunk to always exist
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
78f1c4fc80
test: Chunks can only have one table; no need to specify repeatedly
...
This lets us make the name required and always present on TestChunks,
and make the ID optional.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding)
15aac65c2c
fix: Arrange use statements so rustfmt can manage their order
2021-07-12 09:59:11 -04:00
Nga Tran
7b7a60993d
feat: consider time as a special key
2021-07-09 18:54:22 -04:00
Nga Tran
8f4463664c
feat: add super_key function
2021-07-09 15:37:04 -04:00
Marco Neumann
bc958e2ff0
refactor: use Arcs to pass schemas around
2021-07-09 09:45:12 +02:00
Marco Neumann
09e611deb7
refactor: lift query schema generation up to caller
...
Do no longer scan chunks during query planning to determine the schema
(except for the lifetime jobs where we have a good reason to do so).
Instead pass the schema down to from whoever is triggering the query.
For real SQL queries, we then just use the the table-wide schemas
introduced in #1913 .
Apart from avoiding schema merges we now also don't crash any longer
when no chunks are left in the table (aka columns are present but all
rows are gone).
Fixes #1768 .
Fixes #1884 .
2021-07-09 09:24:21 +02:00
kodiakhq[bot]
c8126784a8
Merge branch 'main' into ntran/avoid_sort_in_scan
2021-07-08 20:22:18 +00:00
Nga Tran
680394b50b
refactor: run fmt
2021-07-08 16:21:42 -04:00
Nga Tran
c5733ab4a7
refactor: remove redudant code
2021-07-08 16:11:42 -04:00
Nga Tran
6738cb272f
refactor: remove duplicate test
2021-07-08 15:59:25 -04:00
Nga Tran
da6249a4df
fix: address reviewers' comments and also fixe a bug they discovered
2021-07-08 15:54:54 -04:00
Andrew Lamb
33bc85ad18
feat: Infrastructure for persistence ( #1925 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 11:14:38 +00:00
Andrew Lamb
7602bde850
chore: Update datafusion deps ( #1799 )
...
* chore: Update datafusion deps + rework code
* refactor: remove workaround as it has been contributed upstream
* fix: Update query/src/exec/split.rs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Nga Tran
d3c4f8c249
fix: store sort key correctly inthe schema. Update tests to reflect it
2021-07-07 15:55:23 -04:00
Andrew Lamb
e6d995cbd8
chore: Update to Rust 1.53.0 ( #1922 )
...
* chore: Update to Rust 1.53.0
* fix: Update to latest clippy standards
* fix: bad refactor
* fix: Update escaping
* test: update test output
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Nga Tran
76789e5902
feat: store sotkey into the chunk schema of RUB
2021-07-06 17:00:35 -04:00
Marco Neumann
b6185982f7
refactor: make `ProviderBuilder` a build-time-checked builder
...
It's safer and also avoids cloning / copying state around.
2021-07-06 18:20:05 +02:00
Marco Neumann
4172d7946c
refactor: make `SchemaMerger` self-consuming
...
The error handling in `merge` was incomplete, aka it could leave the
merger in a half-modified state in case of an error. That's generally a
bad idea and can lead to ugly bugs. Also the "builder" pattern that is
used here usually consumes itself (and provides a clone impl), so it is
easier to reason about modifications. So this commit just changes it to
self-consuming builder.
A nice side effect of the new pattern is also that it is build-time
checked and does not contain a runtime assert any longer.
2021-07-06 18:20:05 +02:00
Andrew Lamb
56c8c8d428
feat: Use separate executor for queries and compactions/moves ( #1870 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:47:50 +00:00
Jacob Marble
0779b0d9bd
feat: add gRPC listener for new write protocol ( #1842 )
...
* feat: add gRPC listener for new write protocol
* chore: clippy happy
* chore: lint
* chore: cargo fmt --all
* chore: cargo clippy
* chore: protobuf-lint
* chore: more formatting
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
kodiakhq[bot]
e03a1a1def
Merge branch 'main' into ntran/dedup_less_concat
2021-07-01 15:59:22 +00:00
Nga Tran
d0afc7a176
refactor: clean up and add a missing else case
2021-07-01 11:00:30 -04:00
Nga Tran
5cf623201d
fix: deduplicate the last batch before sending it downstream
2021-07-01 10:45:23 -04:00
Andrew Lamb
7235c7b965
refactor: Remove vestigial execution counters ( #1865 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 14:08:06 +00:00
Nga Tran
ba919726b6
test: unit tests
2021-06-30 15:01:31 -04:00
Nga Tran
2a06b93b00
chore: Merge branch 'main' into ntran/dedup_less_concat
2021-06-30 11:37:15 -04:00
Nga Tran
1dbdabd66e
fix: 2 values are also considered to be the same if at least one of them is invalid
2021-06-30 10:52:21 -04:00
Raphael Taylor-Davies
62d3305923
feat: optimize the dictionaries in the output of deduplicate node ( #1827 ) ( #1832 )
...
* feat: optimize dedup dictionaries (#1827 )
* fix: handle sliced null bitmasks
* chore: review feedback
2021-06-30 09:30:16 +00:00
Nga Tran
e6a4e0d709
refactor: make the code clearer for schema even though they are the same
2021-06-29 17:46:30 -04:00
Nga Tran
a249b90952
refactor: refactor and add temp info for debugging
2021-06-29 16:35:50 -04:00
Nga Tran
4611e5d584
chore: merge main to branch
2021-06-29 15:39:23 -04:00
Nga Tran
388e7b7650
fix: reset last_batch
2021-06-29 15:15:09 -04:00
Nga Tran
8f309eb569
feat: improve deduplicate to avoid as many concat_batches as possible
2021-06-29 14:41:54 -04:00
Edd Robinson
12ae9b012a
refactor: clarify intent of
2021-06-28 17:39:48 +01:00
Andrew Lamb
2e5f10f6b1
feat: Sort the output of split_plans as well ( #1800 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-25 13:02:30 +00:00
Andrew Lamb
4e7cf39b23
chore: Reduce debug logging in query crate ( #1802 )
2021-06-24 21:01:11 +00:00
Andrew Lamb
79446d45be
feat: Implement split_plans ( #1794 )
...
* feat: implement split plan / planner
* fix: Apply suggestions from code review
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* fix: resolve merge conflicts
* fix: add values to panic
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-24 18:38:00 +00:00
Raphael Taylor-Davies
297fc12db8
feat: compact chunks ( #1776 )
...
* feat: compact chunks
* chore: review feedback
* chore: clippy lints
* chore: document sort key algorithm
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-24 16:49:10 +00:00
Andrew Lamb
0a03605bbc
refactor: pull Channel --> Stream adapater into its own module ( #1793 )
...
* refactor: pull Channel --> Stream adapater into its own module
* docs: Update query/src/exec/stream.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
Andrew Lamb
60eb89cad1
feat: Reorg Planner for merge plans ( #1780 )
...
* feat: Reorg Planner
* docs: add example for split
* fix: clippy
* docs: Specify <= rather than < for split
2021-06-23 10:50:44 +00:00
Andrew Lamb
4c5007f961
fix: Select the correct timestamp for min/max selectors ( #1771 )
...
* test: Reproducer showing that the min/max selectors are order dependent
* fix: pick correct timestamp for first/last selectors
* refactor: remove println
* docs: Fixup comments and add to link to arrow-datafusion/issues/600
* fix: Add debug if timestamp is null
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Andrew Lamb
763ade390c
refactor: rename deduplicate --> overlap ( #1779 )
2021-06-22 17:07:53 +00:00
Andrew Lamb
5362c7c924
feat: enable query deduplication ( #1762 )
2021-06-21 18:49:04 +00:00
Andrew Lamb
bed6ec8c31
feat: Handle merging chunks that have different schemas ( #1761 )
...
* feat: Handle merging chunks that have different schemas
* test: print out original (non deduplicated) data in tests
2021-06-21 15:52:13 +00:00
Andrew Lamb
6559a9e997
refactor: use Schema to compute InfluxDB primary keys ( #1757 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:15:31 +00:00
Andrew Lamb
de67bd3efe
refactor: Remove PartitionChunk::table_schema ( #1756 )
...
* refactor: Remove PartitionChunk::table_schema
* docs: update comments
2021-06-18 16:13:16 +00:00
Andrew Lamb
9beeca3e7c
refactor: Unify schema handling in query crate ( #1755 )
...
* refactor: Unify schema handling in query crate
* fix: doclink
2021-06-18 14:10:57 +00:00
Andrew Lamb
1c13d676b4
refactor: Rename query::PartitionChunk --> query::QueryChunk ( #1754 )
2021-06-18 13:24:09 +00:00