Commit Graph

425 Commits (c9ff8f0f9f206b54c8add4155a2eb9d65443958f)

Author SHA1 Message Date
Nga Tran 23895e6673 feat: Using sort_key to avoid resorts 2021-07-12 18:08:45 -04:00
kodiakhq[bot] f26f844ed2
Merge branch 'main' into ntran/use_sortkey 2021-07-12 18:12:47 +00:00
Carol (Nichols || Goulding) c681da1031 refactor: Define the TestChunk methods with macros 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 4e53a32928 refactor: Completely replace query::provider::overlap::TestChunk with query::test::TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 1698edcc39 refactor: Implement query::provider::overlap::TestChunk in terms of query::test::TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) dc0b97e121 refactor: Completely replace TestChunkMeta with TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 96f9485792 refactor: Move a with_no_stats method to be entirely defined on TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) b4c5a87088 refactor: Rename int field to i64 field to be more consistent 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 54f7ee8b8d refactor: Implement TestChunkMeta in terms of TestChunk
This is a temporary step to make sure TestChunk does everything
TestChunkMeta needs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) ee545ce90e test: Make _with_stats methods able to optionally take max/min
Not used yet, but will be when this is unified with query/src/pruning.rs
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) b26aae1cb4 test: Add an arg to control whether to add a column summary at all
Always true for now, but there are some cases in query/src/pruning.rs
that don't add any column summaries that will use this with `false`.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 6cd75bc688 test: Optionally take stats in add_schema_to_table
This gets rid of a lookup and construction of default stats that aren't
necessary
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) e05ca7f98b fix: Change a method name that says null to not say null
The comment and implementation seem to indicate this is creating
non-null data.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 4406d8a219 test: Always initialize a TableSummary on TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 22d4040c81 test: Always initialize a Schema for TestChunk 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 92cb5986f1 test: Initialize a schema on TestChunk to always exist 2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 78f1c4fc80 test: Chunks can only have one table; no need to specify repeatedly
This lets us make the name required and always present on TestChunks,
and make the ID optional.
2021-07-12 09:59:12 -04:00
Carol (Nichols || Goulding) 15aac65c2c fix: Arrange use statements so rustfmt can manage their order 2021-07-12 09:59:11 -04:00
Nga Tran 7b7a60993d feat: consider time as a special key 2021-07-09 18:54:22 -04:00
Nga Tran 8f4463664c feat: add super_key function 2021-07-09 15:37:04 -04:00
Marco Neumann bc958e2ff0 refactor: use Arcs to pass schemas around 2021-07-09 09:45:12 +02:00
Marco Neumann 09e611deb7 refactor: lift query schema generation up to caller
Do no longer scan chunks during query planning to determine the schema
(except for the lifetime jobs where we have a good reason to do so).
Instead pass the schema down to from whoever is triggering the query.
For real SQL queries, we then just use the the table-wide schemas
introduced in #1913.

Apart from avoiding schema merges we now also don't crash any longer
when no chunks are left in the table (aka columns are present but all
rows are gone).

Fixes #1768.
Fixes #1884.
2021-07-09 09:24:21 +02:00
kodiakhq[bot] c8126784a8
Merge branch 'main' into ntran/avoid_sort_in_scan 2021-07-08 20:22:18 +00:00
Nga Tran 680394b50b refactor: run fmt 2021-07-08 16:21:42 -04:00
Nga Tran c5733ab4a7 refactor: remove redudant code 2021-07-08 16:11:42 -04:00
Nga Tran 6738cb272f refactor: remove duplicate test 2021-07-08 15:59:25 -04:00
Nga Tran da6249a4df fix: address reviewers' comments and also fixe a bug they discovered 2021-07-08 15:54:54 -04:00
Andrew Lamb 33bc85ad18
feat: Infrastructure for persistence (#1925)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 11:14:38 +00:00
Andrew Lamb 7602bde850
chore: Update datafusion deps (#1799)
* chore: Update datafusion deps + rework code

* refactor: remove workaround as it has been contributed upstream

* fix: Update query/src/exec/split.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Nga Tran d3c4f8c249 fix: store sort key correctly inthe schema. Update tests to reflect it 2021-07-07 15:55:23 -04:00
Andrew Lamb e6d995cbd8
chore: Update to Rust 1.53.0 (#1922)
* chore: Update to Rust 1.53.0

* fix: Update to latest clippy standards

* fix: bad refactor

* fix: Update escaping

* test: update test output

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Nga Tran 76789e5902 feat: store sotkey into the chunk schema of RUB 2021-07-06 17:00:35 -04:00
Marco Neumann b6185982f7 refactor: make `ProviderBuilder` a build-time-checked builder
It's safer and also avoids cloning / copying state around.
2021-07-06 18:20:05 +02:00
Marco Neumann 4172d7946c refactor: make `SchemaMerger` self-consuming
The error handling in `merge` was incomplete, aka it could leave the
merger in a half-modified state in case of an error. That's generally a
bad idea and can lead to ugly bugs. Also the "builder" pattern that is
used here usually consumes itself (and provides a clone impl), so it is
easier to reason about modifications. So this commit just changes it to
self-consuming builder.

A nice side effect of the new pattern is also that it is build-time
checked and does not contain a runtime assert any longer.
2021-07-06 18:20:05 +02:00
Andrew Lamb 56c8c8d428
feat: Use separate executor for queries and compactions/moves (#1870)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:47:50 +00:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
kodiakhq[bot] e03a1a1def
Merge branch 'main' into ntran/dedup_less_concat 2021-07-01 15:59:22 +00:00
Nga Tran d0afc7a176 refactor: clean up and add a missing else case 2021-07-01 11:00:30 -04:00
Nga Tran 5cf623201d fix: deduplicate the last batch before sending it downstream 2021-07-01 10:45:23 -04:00
Andrew Lamb 7235c7b965
refactor: Remove vestigial execution counters (#1865)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 14:08:06 +00:00
Nga Tran ba919726b6 test: unit tests 2021-06-30 15:01:31 -04:00
Nga Tran 2a06b93b00 chore: Merge branch 'main' into ntran/dedup_less_concat 2021-06-30 11:37:15 -04:00
Nga Tran 1dbdabd66e fix: 2 values are also considered to be the same if at least one of them is invalid 2021-06-30 10:52:21 -04:00
Raphael Taylor-Davies 62d3305923
feat: optimize the dictionaries in the output of deduplicate node (#1827) (#1832)
* feat: optimize dedup dictionaries (#1827)

* fix: handle sliced null bitmasks

* chore: review feedback
2021-06-30 09:30:16 +00:00
Nga Tran e6a4e0d709 refactor: make the code clearer for schema even though they are the same 2021-06-29 17:46:30 -04:00
Nga Tran a249b90952 refactor: refactor and add temp info for debugging 2021-06-29 16:35:50 -04:00
Nga Tran 4611e5d584 chore: merge main to branch 2021-06-29 15:39:23 -04:00
Nga Tran 388e7b7650 fix: reset last_batch 2021-06-29 15:15:09 -04:00
Nga Tran 8f309eb569 feat: improve deduplicate to avoid as many concat_batches as possible 2021-06-29 14:41:54 -04:00
Edd Robinson 12ae9b012a refactor: clarify intent of 2021-06-28 17:39:48 +01:00
Andrew Lamb 2e5f10f6b1
feat: Sort the output of split_plans as well (#1800)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-25 13:02:30 +00:00
Andrew Lamb 4e7cf39b23
chore: Reduce debug logging in query crate (#1802) 2021-06-24 21:01:11 +00:00
Andrew Lamb 79446d45be
feat: Implement split_plans (#1794)
* feat: implement split plan / planner

* fix: Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* fix: resolve merge conflicts

* fix: add values to panic

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-24 18:38:00 +00:00
Raphael Taylor-Davies 297fc12db8
feat: compact chunks (#1776)
* feat: compact chunks

* chore: review feedback

* chore: clippy lints

* chore: document sort key algorithm

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-24 16:49:10 +00:00
Andrew Lamb 0a03605bbc
refactor: pull Channel --> Stream adapater into its own module (#1793)
* refactor: pull Channel --> Stream adapater into its own module

* docs: Update query/src/exec/stream.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
Andrew Lamb 60eb89cad1
feat: Reorg Planner for merge plans (#1780)
* feat: Reorg Planner

* docs: add example for split

* fix: clippy

* docs: Specify <= rather than < for split
2021-06-23 10:50:44 +00:00
Andrew Lamb 4c5007f961
fix: Select the correct timestamp for min/max selectors (#1771)
* test: Reproducer showing that the min/max selectors are order dependent

* fix: pick correct timestamp for first/last selectors

* refactor: remove println

* docs: Fixup comments and add to link to arrow-datafusion/issues/600

* fix: Add debug if timestamp is null

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Andrew Lamb 763ade390c
refactor: rename deduplicate --> overlap (#1779) 2021-06-22 17:07:53 +00:00
Andrew Lamb 5362c7c924
feat: enable query deduplication (#1762) 2021-06-21 18:49:04 +00:00
Andrew Lamb bed6ec8c31
feat: Handle merging chunks that have different schemas (#1761)
* feat: Handle merging chunks that have different schemas

* test: print out original (non deduplicated) data in tests
2021-06-21 15:52:13 +00:00
Andrew Lamb 6559a9e997
refactor: use Schema to compute InfluxDB primary keys (#1757)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:15:31 +00:00
Andrew Lamb de67bd3efe
refactor: Remove PartitionChunk::table_schema (#1756)
* refactor: Remove PartitionChunk::table_schema

* docs: update comments
2021-06-18 16:13:16 +00:00
Andrew Lamb 9beeca3e7c
refactor: Unify schema handling in query crate (#1755)
* refactor: Unify schema handling in query crate

* fix: doclink
2021-06-18 14:10:57 +00:00
Andrew Lamb 1c13d676b4
refactor: Rename query::PartitionChunk --> query::QueryChunk (#1754) 2021-06-18 13:24:09 +00:00
Andrew Lamb c5eea9af6a
feat: Implement DeduplicateExec (#1733)
* feat: Implement DeduplicateExec

* fix: Doc comments

* fix: fix comment

* fix: Update with arrow ticket references and use datafusion coalsce batches impl

* refactor: rename inner.rs to algo.rs

* docs: Add additional documentation on rationale for last field value

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Update query/src/provider/deduplicate/algo.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: do not use pub(crate)

* docs: fix test comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-17 14:17:52 +00:00
Andrew Lamb b42218a197
chore: Add proper format for SchemaPivotNode (#1744) 2021-06-17 11:32:48 +00:00
Raphael Taylor-Davies 38d17a3093
chore: remove unused query dependency (#1731)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 22:06:13 +00:00
Edd Robinson e2315f0016 refactor: revert reead_filter debugging 2021-06-14 17:54:21 +01:00
Edd Robinson 6657e6f596 refactor: update query/src/exec/seriesset.rs 2021-06-14 16:09:02 +01:00
Edd Robinson 58f4073a7d
Merge branch 'main' into er/fix/dictionary_dupe_keys 2021-06-14 15:59:58 +01:00
Edd Robinson ec52bca309 fix: ensure values are different 2021-06-14 15:28:35 +01:00
kodiakhq[bot] cf6b658ee3
Merge branch 'main' into er/duplicate_keys 2021-06-14 11:10:45 +00:00
Andrew Lamb 0d8d32fd8f
chore: Update deps to get latest arrow (#1708)
* chore: Update deps to get latest arrow

* fix: Update to rust 1.52

* fix: clippy
2021-06-14 11:08:09 +00:00
Edd Robinson 1612ebcbdb refactor: more debug logging 2021-06-14 12:07:51 +01:00
Edd Robinson 927d6f890f
Merge branch 'main' into er/duplicate_keys 2021-06-14 10:29:46 +01:00
Edd Robinson 96fb595cc0 refactor: read_filter debugging 2021-06-14 10:22:05 +01:00
Nga Tran 11729b9aa7
test: select non-key from 2 chunks with different key/tag sets (#1703)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-11 18:52:36 +00:00
Nga Tran 736cf1ff6f
Merge branch 'main' into ntran/dedupe_final_union 2021-06-11 09:45:54 -04:00
Nga Tran 7dd0416960 refactor: address review comments 2021-06-11 09:43:39 -04:00
Nga Tran e34d157f28 fix: comments 2021-06-11 07:30:49 -04:00
Nga Tran ea9edef716 fix: testing option 2021-06-11 07:18:33 -04:00
Nga Tran fb639ee54f feat: add UnionExec on top of the scan activities 2021-06-11 07:06:08 -04:00
Andrew Lamb 13dd4b23fd
fix: make pruning debug log less confusing (#1684) 2021-06-10 18:35:04 +00:00
kodiakhq[bot] 16b268402e
Merge branch 'main' into ntran/dedup_merge_exec 2021-06-10 17:13:49 +00:00
Nga Tran 46d4ab1f2a refactor: address review comments 2021-06-10 13:13:02 -04:00
Marco Neumann 7b1106ff64 chore: enforce `clippy::future_not_send` for `query` 2021-06-10 09:48:35 +02:00
Nga Tran 4cf05df35b feat: hook SortPreservingMergeExec into deduplication framework 2021-06-09 23:29:44 -04:00
Nga Tran 4478d900ee refactor: capture test output 2021-06-09 15:09:13 -04:00
Nga Tran 8cc99e3420 Merge branch 'ntran/dedup_within_chunk' of https://github.com/influxdata/influxdb_iox into ntran/dedup_within_chunk 2021-06-09 14:40:29 -04:00
Nga Tran b3c94b9d65 refactor: change order of fields to pass circle CI tests 2021-06-09 14:40:10 -04:00
kodiakhq[bot] eed73a30c5
Merge branch 'main' into ntran/dedup_within_chunk 2021-06-09 18:19:17 +00:00
Nga Tran c1c58018fc refactor: address review comments 2021-06-09 14:17:47 -04:00
Andrew Lamb 89fcc457f4
fix: Fix bug in chunk overlap calculation due to nulls (#1669)
* fix: Fix bug in chunk overlap calculation due to nulls

* docs: add note about algorithmic complexity

* fix: avoid recursion in normal case
2021-06-09 17:46:39 +00:00
Raphael Taylor-Davies 07c4277ca7
refactor: schema merge to give more control over field merging (#1653)
* refactor: schema merge to give more control over field merging

* chore: review feedback
2021-06-09 06:30:45 +00:00
Nga Tran 3d50ff7a60 refactor: remove comments 2021-06-08 21:48:57 -04:00
Nga Tran ab7d3384b7 refactor: remove unused comments 2021-06-08 21:43:02 -04:00
Nga Tran 3e10351538 test: add tests for the sort plan 2021-06-08 21:40:46 -04:00
Andrew Lamb cba7f270b4
docs: Improve comments + whitespace (#1663) 2021-06-08 21:13:35 +00:00
Nga Tran 68e3a2121f feat: add SortExec 2021-06-08 15:04:31 -04:00
Andrew Lamb 666204d4a8 fix: remove whitespace changes 2021-06-08 14:46:55 -04:00
Andrew Lamb b23c4e5210 fix: clippy 2021-06-08 14:44:48 -04:00
Andrew Lamb fd8a87484e feat: Hook up chunk grouping into provider 2021-06-08 14:42:37 -04:00
Nga Tran edbf1b7d5e Merge branch 'main' into ntran/dedup_within_chunk 2021-06-08 13:18:40 -04:00
Nga Tran 40cb4f741f feat: initial implementaton 2021-06-08 13:17:36 -04:00
Andrew Lamb 62e8675737
refactor: move primary_key calculaton to TableSummary (#1659) 2021-06-08 17:06:37 +00:00
Andrew Lamb 34ba268cf1
feat: Group chunks by potential overlap (#1654)
* feat: Group chunks by potential overlap

* docs: clarify in what way the calculation is conservative

* fix: Add test for mixed nulls
2021-06-08 16:55:29 +00:00
Edd Robinson b88f277477 feat: enable not eq operator 2021-06-08 15:57:07 +01:00
Andrew Lamb e9834a907c
feat: Prune on boolean column predicates too (#1629)
* chore: update deps to get latest DataFusion

* fix: enable boolean pruning tests

* fix: update explain plan tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:51:30 +00:00
Nga Tran ff641e5638 refactor: address Andrew's comments 2021-06-06 22:36:44 -04:00
Nga Tran 2f82a9d670 feat: full foundation for deduplicate with todo functions to finish 2021-06-06 22:09:01 -04:00
Andrew Lamb ff3215e6a9
feat: Implement Chunk Pruning (#1567) 2021-06-04 13:05:22 +00:00
Andrew Lamb c986ce2c19
feat: Add pruning module to query crate (#1611)
* feat: Add pruning module

* fix: clippy

* fix: Apply suggestions from code review

* fix: remove erronious claims of DF bugs

* fix: update comments with DF bug reference
2021-06-03 11:07:26 +00:00
Nga Tran e7a97f3ac1 test: merge main and add more tests for deduplicate work 2021-06-02 12:00:40 -04:00
Nga Tran 60ad929721 refactor: add macro tto compare output of explains 2021-06-01 16:39:14 -04:00
Nga Tran aa867601e5 chore: merge main with DF plan display fix 2021-06-01 16:17:41 -04:00
Andrew Lamb d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API (#1588)
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API

* fix: remove test that can no longer fail

* fix: update tests + code review comments

* fix: clippy

* fix: clippy

* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Andrew Lamb d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files (#1580)
* refactor: use ParquetExec to read parquet files

* fix: test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb 162a808a8d
refactor: Remove `table_name` from PartitionChunk API (#1584)
* refactor: Remove `table_name` from PartitionChunk API

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 12:05:09 +00:00
Andrew Lamb d50c7c8919
chore: remove unused dependency (#1581) 2021-05-31 09:58:10 +00:00
Nga Tran 62147ff0d4 feat: add more explain tests 2021-05-27 12:19:41 -04:00
Raphael Taylor-Davies 5d342d7779
feat: associate tracker with lifecycle action (#1099) (#1556)
* feat: associate tracker with lifecycle action (#1099)

* chore: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-27 10:47:35 +00:00
Raphael Taylor-Davies 4fcc04e6c9
chore: enable arrow prettyprint feature (#1566) 2021-05-27 10:28:14 +00:00
Raphael Taylor-Davies c2fd85209c
feat: wait for task shutdown on DedicatedExecutor (#1537) 2021-05-25 11:33:55 +00:00
Andrew Lamb 14ba25f86d
chore: Update datafusion and use released version of arrow crates (#1546)
* chore: Update datafusion and use released version of arrow crate

* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Nga Tran 0563005aac chore: remove leftover comments 2021-05-21 17:01:49 -04:00
Nga Tran f113abacb5 feat: more unit & e2e tests plus cleanup and addressing review comments of Andrew and Edd 2021-05-21 16:48:43 -04:00
Nga Tran e44a3a87db feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet 2021-05-20 16:56:15 -04:00
Nga Tran 51de37e752 chore: run fmt 2021-05-19 15:28:44 -04:00
Nga Tran 11561111d5 chore: merge main to branch 2021-05-19 15:11:15 -04:00
Nga Tran 1f13842550 chore: modify comments 2021-05-19 14:49:48 -04:00
Nga Tran 087d61f229 feat: Part 1 of predicate push down - Send predicates to MUB, RUB, and Parquet File. Note that MUB has not handled predicates yet 2021-05-19 13:59:51 -04:00
Andrew Lamb 7e223780f3
feat: Implement Display for query::predicate to improve debug printing of plans (#1519)
* feat: Implement Display for query::predicate to improve debug printing of plans

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-19 12:38:34 +00:00
Andrew Lamb 0680a5167f
chore: Improve DataFusion plan logging (#1508)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-18 11:08:06 +00:00
Andrew Lamb 07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs (#1500) 2021-05-15 10:18:01 +00:00
Edd Robinson 8ccc359cab refactor: address PR feedback 2021-05-07 13:48:44 +01:00
Edd Robinson 4c4bd2f164 refactor: update query/src/func/regex.rs 2021-05-07 13:44:51 +01:00
Edd Robinson 4cc7a99854 refactor: include not match in support check 2021-05-07 13:44:51 +01:00
Edd Robinson beee3115f4 feat: expose regex =\~ and to gRPC API 2021-05-07 13:44:51 +01:00
Edd Robinson eae3fec571 feat: wire up regex UDF as predicate filter expr 2021-05-07 13:44:51 +01:00
Edd Robinson 3fc2c9fc04 feat: add DataFusion regex match operator
This commit adds a new custom UDF to IOx that provide a regex operator to Datafusion plans.
Effectively it allows predicates to contain regex operators that are applied as filters, only allowing rows that satisfy the regex to be returned.

I did not use the Arrow regex kernel for this work because that does not return a boolean array indicating which rows matched a regex, but instead returns a new string array of results. This doesn't work well with DF's approach to filtering.
2021-05-07 13:44:51 +01:00
Carol (Nichols || Goulding) febc1538ff
chore: Update Rust version (#1445)
* chore: Update Rust version

* refactor: Make struct constructor field orderings consistent

Sometimes I changed the struct definition, sometimes changed the struct
construction instance, depending on consistency with code around each
(other similar structs, function argument orders, etc)

More info: https://rust-lang.github.io/rust-clippy/master/index.html#inconsistent_struct_constructor

* refactor: Use flatten where appropriate

One instance is a false positive with a clippy bug.

More info:

- https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_identity
- https://rust-lang.github.io/rust-clippy/master/index.html#manual_flatten

* refactor: Use Option map instead of match

More info: https://rust-lang.github.io/rust-clippy/master/index.html#manual_map

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 22:07:10 +00:00
Raphael Taylor-Davies 44de42906f
refactor: use Arc<str> instead of Arc<String> (#1442) 2021-05-06 17:05:08 +00:00
Raphael Taylor-Davies 411cf134e9
refactor: explode arrow_deps (#1425)
* refactor: explode arrow_deps

* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Edd Robinson 2f789485e6 refactor: fix spelling 2021-05-05 11:06:04 +01:00
Andrew Lamb 3b7c5ac350
fix(storage rpc): do not send back tags with empty values (#1403) 2021-05-04 10:35:24 +00:00
Andrew Lamb 40b9b09cdc
refactor: rename assert_table_eq to assert_batches_eq (#1368) 2021-04-30 10:51:08 +00:00
Andrew Lamb eb8d91cf1c
refactor: remove additional uses of RecordBatch::try_new (#1378)
* refactor: remove additional uses of RecordBatch::try_new

* fix: fix accidental change

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-30 10:24:47 +00:00
Edd Robinson 13fbf2e68d refactor: plumb registry to gRPC server 2021-04-29 14:00:05 +01:00
Edd Robinson 4acbdcf1c9 refactor: address PR feedback 2021-04-28 16:11:57 +00:00
Edd Robinson a9ef604ef6 perf: avoid using channels for query execution
Pre-sized channels get full when the results to send over them are larger than the capacities. This causes significant runtime overhead and slows down query performance.

This commit removes the intermediate channels. The potential downside to this approach is there may be more buffering which could increase memory usage during query and also block a thread for longer periods of time.
2021-04-28 16:11:57 +00:00
Raphael Taylor-Davies 7ca1da3fcd
feat: pushdown table and partition key predicates to catalog (#736) (#1327)
* feat: catalog predicate pushdown (#736)

* chore: fix lints

* chore: review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-27 15:31:47 +00:00
Marco Neumann 91bccdfca3 ci: pass `--document-private-items` to `cargo doc` 2021-04-27 15:42:07 +02:00
Marco Neumann eddc9319ff docs: deny broken intradoc links 2021-04-27 13:22:28 +02:00
Raphael Taylor-Davies 20117de078
feat: string dictionary encoding (#1220) (#1262)
* feat: string dictionary encoding (#1220)

* chore: review comments

* chore: fix lint

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-27 09:36:58 +00:00
Edd Robinson a322d05838 refactor: rust fmt 2021-04-20 17:30:50 +00:00
Edd Robinson 554b3b4662 refactor: satisfy new clippy lints 2021-04-20 17:30:50 +00:00
Carol (Nichols || Goulding) 51041ba2d9 fix: Prefer implementing From over Into 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) 757933afc4 fix: use Self when possible 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) f136931225 fix: Inconsistent ordering lints 2021-04-19 08:48:11 -04:00
Carol (Nichols || Goulding) 3e87ce5232 fix: Make this trait and methods more idiomatically named
"into" usually takes ownership and does a conversion; "as" takes
references and provides a different view.
2021-04-19 08:45:34 -04:00
Andrew Lamb 529c99c93f
fix: don't clone arrays to make TimestampNanosecondArrays (#1241)
* fix: avoid clone

* fix: remove another clone
2021-04-16 18:40:22 +00:00
Andrew Lamb e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx (#1230)
* refactor: Create Arrow arrays using iterators

* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps

* feat: add support for timestamp array

* fix: update more tests

* fix: remove unecessary code

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Andrew Lamb f092294da3
fix: Use MAX (window end) for timestamps in read group (#1228)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 10:51:38 +00:00
Andrew Lamb 5aeeccb97c
feat: Run query plans on the database wide executor as well (#1210)
* feat: route all query planning through executor

* fix: Rename JoinError -> TaskJoinError and make message clearer

* fix: remove dangling comment

* fix: remove confusing comments
2021-04-15 11:57:20 +00:00
Andrew Lamb 59ca090aef
feat: Use single db-wide executor for running queries (#1198)
* refactor: plumb executor into all Db instances

* refactor: Route all query executions through worker pool

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-14 16:46:02 +00:00
Andrew Lamb 8f1bf8a960
fix: Remove mutex acquisition in impl `std::fmt::Debug` for DedicatedExecutor (#1205) 2021-04-14 12:09:40 +00:00
Andrew Lamb f5f768d750
feat: Add a dedicated threadpool for running queries (#1191)
* feat: use a dedicated tokio threadpool for running queries

* feat: plumb number of executor threads through to command line

thread through command line

* fix: Logical merge conflict

* fix: another logical conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-14 10:48:09 +00:00
Andrew Lamb 150ed4e1d9
refactor: Remove async from `InfluxRPCPlanner` (#1200)
* refactor: Remove async from InfluxRPCPlanner

* fix: make it compile

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-13 22:17:19 +00:00
Paul Dix 7e28f8ef66 feat: Implement Entry writing to Db
This removes the old ReplicatedWrite structure and implements the writing of an Entry to the Db. I also call out in `server/lib.rs` and in the `Db` where sharding and replication might happen.

I've also added helpers in various places to write line protocol to chunks, tables, and databases. That enabled removing a good amount of code from the test helpers crate.
2021-04-13 12:52:14 +00:00
Raphael Taylor-Davies 1997324344
feat: mutable buffer snapshotting (#1179)
* feat: mutable buffer snapshotting

* chore: review feedback
2021-04-13 12:14:54 +00:00
Raphael Taylor-Davies 078c0f3fda
refactor: lift chunk and table summaries out of DBChunk (#1162)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-09 12:00:47 +00:00
Nga Tran be6e1e48e4 feat: add writer_id and object_store in Db 2021-04-07 18:36:07 -04:00
Carol (Nichols || Goulding) 82588d5c72 fix: Don't return Result from test functions 2021-04-07 12:40:00 -04:00
Raphael Taylor-Davies 5cd1d6691d
refactor: use DatabaseName in DatabaseRules (#1127) 2021-04-06 13:26:30 +00:00
Jacob Marble 80d55d0829 chore: rename tracing_deps to observability_deps
OpenTelemetry makes this necessary.
2021-04-02 13:14:30 -07:00
Carol (Nichols || Goulding) 0b880d3534 chore: Group all tracing-related crates under one crate for easier upgrade management 2021-04-02 09:54:39 -04:00
Andrew Lamb 569f90d937
feat: Add ability to get PartitionSummary statistics from a Db (#1090)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-31 14:18:53 +00:00
Andrew Lamb f0b411cd43 feat: enable information_schema 2021-03-30 09:01:43 -04:00
Andrew Lamb 6a48001d13
refactor: Manage storage directly in the Catalog (#1057)
* refactor: Manage mutable buffer chunks directly

* fix: do not use mutable_buffer for listing table names
2021-03-29 17:55:07 +00:00
Andrew Lamb eb0122655d
refactor: Remove async from PartitionChunk (#1062)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-29 13:00:36 +00:00
Andrew Lamb 02ae743e8e
refactor: Remove async from Database (#1063) 2021-03-29 12:48:12 +00:00
Raphael Taylor-Davies fb130ea99d
feat: use CatalogProvider and SchemaProvider (#1058)
* feat: use CatalogProvider and SchemaProvider

* refactor: review comments
2021-03-29 11:08:46 +00:00
Andrew Lamb 0ca9ad7285
refactor: Remove async from `PartitionChunk::table_schema` (#1060) 2021-03-27 18:08:12 +00:00
Andrew Lamb 663d4fb6f7
docs: Use Scan rather than InMemoryScan for clarity (#1049)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-26 14:22:49 +00:00
Andrew Lamb 895e808754
chore: Upgrade arrow deps (#1046)
* chore: Upgrade dependencies

* chore: upgrade query for new interfaces

* chore: update read_buffer
2021-03-25 13:35:08 +00:00
Andrew Lamb 6e1795fda0
refactor: Move some types (not yet exposed to clients) into internal_types (#1015)
* refactor: Move some types (not yet exposed to clients) into internal_types

* docs: Add README.md explaining the rationale

* refactor: remove some stragglers

* fix: fix benches

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: add clippy lints

* fix: fmt

* docs: Apply suggestions from code review

fix typos

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-19 16:27:57 +00:00
Andrew Lamb 72eff5eed5 chore: update deps (including arrow) 2021-03-16 18:15:44 -04:00
Raphael Taylor-Davies 65f7a1ac5b
fix: use consistent crate versions (#989)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-15 15:42:19 +00:00
Andrew Lamb 6ac7e2c1a7
feat: Add management API and CLI to list chunks (#968)
* feat: Add management API and CLI to list chunks

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: add comment to protobuf

* fix: fix comment

* fix: fmt, fixup merge errors

* fix: fascinating type dance with prost generated types

* fix: clippy

* fix: move command to influxdb_iox database chunk list

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-03-12 13:56:14 +00:00
Raphael Taylor-Davies 0ff527285c
refactor: remove unnecessary async from DatabaseStore trait (#965)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-11 11:33:53 +00:00
Andrew Lamb 746373a687
refactor: Remove mutable_buffer crate dependency on query crate (#927) 2021-03-05 11:34:27 +00:00
Andrew Lamb 8b1f100df3
feat: make read_group and read_window_aggregate work across chunks (#905)
* feat: make read_group and read_window_aggregate work across chunks

* refactor: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: Update query/src/frontend/influxrpc.rs

Improve logic and use strings directly

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-04 17:06:31 +00:00
Nga Tran 957e05ef25 chore: use newly added Arrow's Expr::is_not_null function 2021-03-03 11:46:49 -05:00
Andrew Lamb 94bd200e60
refactor: Add Predicate::is_empty() and EMPTY_PREDICATE to avoid unecessary construction (#891)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-03-01 21:03:05 +00:00
Andrew Lamb 7d8d00781c
feat: Make read_filter work for mutable buffer and read buffer (#882)
* feat: port read_filter to InfluxRPCPlanner

* fix: remove commented out vestigal test

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt

* fix: Update arrow_deps/src/util.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-03-01 16:50:29 +00:00
Nga Tran 6ad8e1aa33 feat: use newly implemented tags_iter to get Tag columns 2021-02-26 15:54:20 -05:00
Nga Tran 18de3bdcab chore: merge main into branch
Merge branch 'main' into ntran/optimize_column_selection
2021-02-26 15:29:43 -05:00
Nga Tran f37e5846aa feat: fmt auto fix 2021-02-26 14:56:10 -05:00
NGA TRAN eb81975151 feat: Optimize Column Selection 2021-02-26 14:28:46 -05:00
Andrew Lamb 12deacd8a0
refactor: move SeriesSetPlans into its own module (#878)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-02-25 23:12:39 +00:00