Nga Tran
388e7b7650
fix: reset last_batch
2021-06-29 15:15:09 -04:00
Nga Tran
8f309eb569
feat: improve deduplicate to avoid as many concat_batches as possible
2021-06-29 14:41:54 -04:00
Edd Robinson
12ae9b012a
refactor: clarify intent of
2021-06-28 17:39:48 +01:00
Andrew Lamb
2e5f10f6b1
feat: Sort the output of split_plans as well ( #1800 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-25 13:02:30 +00:00
Andrew Lamb
4e7cf39b23
chore: Reduce debug logging in query crate ( #1802 )
2021-06-24 21:01:11 +00:00
Andrew Lamb
79446d45be
feat: Implement split_plans ( #1794 )
...
* feat: implement split plan / planner
* fix: Apply suggestions from code review
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* fix: resolve merge conflicts
* fix: add values to panic
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-24 18:38:00 +00:00
Raphael Taylor-Davies
297fc12db8
feat: compact chunks ( #1776 )
...
* feat: compact chunks
* chore: review feedback
* chore: clippy lints
* chore: document sort key algorithm
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-24 16:49:10 +00:00
Andrew Lamb
0a03605bbc
refactor: pull Channel --> Stream adapater into its own module ( #1793 )
...
* refactor: pull Channel --> Stream adapater into its own module
* docs: Update query/src/exec/stream.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
Andrew Lamb
60eb89cad1
feat: Reorg Planner for merge plans ( #1780 )
...
* feat: Reorg Planner
* docs: add example for split
* fix: clippy
* docs: Specify <= rather than < for split
2021-06-23 10:50:44 +00:00
Andrew Lamb
4c5007f961
fix: Select the correct timestamp for min/max selectors ( #1771 )
...
* test: Reproducer showing that the min/max selectors are order dependent
* fix: pick correct timestamp for first/last selectors
* refactor: remove println
* docs: Fixup comments and add to link to arrow-datafusion/issues/600
* fix: Add debug if timestamp is null
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Andrew Lamb
763ade390c
refactor: rename deduplicate --> overlap ( #1779 )
2021-06-22 17:07:53 +00:00
Andrew Lamb
5362c7c924
feat: enable query deduplication ( #1762 )
2021-06-21 18:49:04 +00:00
Andrew Lamb
bed6ec8c31
feat: Handle merging chunks that have different schemas ( #1761 )
...
* feat: Handle merging chunks that have different schemas
* test: print out original (non deduplicated) data in tests
2021-06-21 15:52:13 +00:00
Andrew Lamb
6559a9e997
refactor: use Schema to compute InfluxDB primary keys ( #1757 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:15:31 +00:00
Andrew Lamb
de67bd3efe
refactor: Remove PartitionChunk::table_schema ( #1756 )
...
* refactor: Remove PartitionChunk::table_schema
* docs: update comments
2021-06-18 16:13:16 +00:00
Andrew Lamb
9beeca3e7c
refactor: Unify schema handling in query crate ( #1755 )
...
* refactor: Unify schema handling in query crate
* fix: doclink
2021-06-18 14:10:57 +00:00
Andrew Lamb
1c13d676b4
refactor: Rename query::PartitionChunk --> query::QueryChunk ( #1754 )
2021-06-18 13:24:09 +00:00
Andrew Lamb
c5eea9af6a
feat: Implement DeduplicateExec ( #1733 )
...
* feat: Implement DeduplicateExec
* fix: Doc comments
* fix: fix comment
* fix: Update with arrow ticket references and use datafusion coalsce batches impl
* refactor: rename inner.rs to algo.rs
* docs: Add additional documentation on rationale for last field value
* docs: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* docs: Update query/src/provider/deduplicate/algo.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* docs: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: do not use pub(crate)
* docs: fix test comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-17 14:17:52 +00:00
Andrew Lamb
b42218a197
chore: Add proper format for SchemaPivotNode ( #1744 )
2021-06-17 11:32:48 +00:00
Raphael Taylor-Davies
38d17a3093
chore: remove unused query dependency ( #1731 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 22:06:13 +00:00
Edd Robinson
e2315f0016
refactor: revert reead_filter debugging
2021-06-14 17:54:21 +01:00
Edd Robinson
6657e6f596
refactor: update query/src/exec/seriesset.rs
2021-06-14 16:09:02 +01:00
Edd Robinson
58f4073a7d
Merge branch 'main' into er/fix/dictionary_dupe_keys
2021-06-14 15:59:58 +01:00
Edd Robinson
ec52bca309
fix: ensure values are different
2021-06-14 15:28:35 +01:00
kodiakhq[bot]
cf6b658ee3
Merge branch 'main' into er/duplicate_keys
2021-06-14 11:10:45 +00:00
Andrew Lamb
0d8d32fd8f
chore: Update deps to get latest arrow ( #1708 )
...
* chore: Update deps to get latest arrow
* fix: Update to rust 1.52
* fix: clippy
2021-06-14 11:08:09 +00:00
Edd Robinson
1612ebcbdb
refactor: more debug logging
2021-06-14 12:07:51 +01:00
Edd Robinson
927d6f890f
Merge branch 'main' into er/duplicate_keys
2021-06-14 10:29:46 +01:00
Edd Robinson
96fb595cc0
refactor: read_filter debugging
2021-06-14 10:22:05 +01:00
Nga Tran
11729b9aa7
test: select non-key from 2 chunks with different key/tag sets ( #1703 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-11 18:52:36 +00:00
Nga Tran
736cf1ff6f
Merge branch 'main' into ntran/dedupe_final_union
2021-06-11 09:45:54 -04:00
Nga Tran
7dd0416960
refactor: address review comments
2021-06-11 09:43:39 -04:00
Nga Tran
e34d157f28
fix: comments
2021-06-11 07:30:49 -04:00
Nga Tran
ea9edef716
fix: testing option
2021-06-11 07:18:33 -04:00
Nga Tran
fb639ee54f
feat: add UnionExec on top of the scan activities
2021-06-11 07:06:08 -04:00
Andrew Lamb
13dd4b23fd
fix: make pruning debug log less confusing ( #1684 )
2021-06-10 18:35:04 +00:00
kodiakhq[bot]
16b268402e
Merge branch 'main' into ntran/dedup_merge_exec
2021-06-10 17:13:49 +00:00
Nga Tran
46d4ab1f2a
refactor: address review comments
2021-06-10 13:13:02 -04:00
Marco Neumann
7b1106ff64
chore: enforce `clippy::future_not_send` for `query`
2021-06-10 09:48:35 +02:00
Nga Tran
4cf05df35b
feat: hook SortPreservingMergeExec into deduplication framework
2021-06-09 23:29:44 -04:00
Nga Tran
4478d900ee
refactor: capture test output
2021-06-09 15:09:13 -04:00
Nga Tran
8cc99e3420
Merge branch 'ntran/dedup_within_chunk' of https://github.com/influxdata/influxdb_iox into ntran/dedup_within_chunk
2021-06-09 14:40:29 -04:00
Nga Tran
b3c94b9d65
refactor: change order of fields to pass circle CI tests
2021-06-09 14:40:10 -04:00
kodiakhq[bot]
eed73a30c5
Merge branch 'main' into ntran/dedup_within_chunk
2021-06-09 18:19:17 +00:00
Nga Tran
c1c58018fc
refactor: address review comments
2021-06-09 14:17:47 -04:00
Andrew Lamb
89fcc457f4
fix: Fix bug in chunk overlap calculation due to nulls ( #1669 )
...
* fix: Fix bug in chunk overlap calculation due to nulls
* docs: add note about algorithmic complexity
* fix: avoid recursion in normal case
2021-06-09 17:46:39 +00:00
Raphael Taylor-Davies
07c4277ca7
refactor: schema merge to give more control over field merging ( #1653 )
...
* refactor: schema merge to give more control over field merging
* chore: review feedback
2021-06-09 06:30:45 +00:00
Nga Tran
3d50ff7a60
refactor: remove comments
2021-06-08 21:48:57 -04:00
Nga Tran
ab7d3384b7
refactor: remove unused comments
2021-06-08 21:43:02 -04:00
Nga Tran
3e10351538
test: add tests for the sort plan
2021-06-08 21:40:46 -04:00
Andrew Lamb
cba7f270b4
docs: Improve comments + whitespace ( #1663 )
2021-06-08 21:13:35 +00:00
Nga Tran
68e3a2121f
feat: add SortExec
2021-06-08 15:04:31 -04:00
Andrew Lamb
666204d4a8
fix: remove whitespace changes
2021-06-08 14:46:55 -04:00
Andrew Lamb
b23c4e5210
fix: clippy
2021-06-08 14:44:48 -04:00
Andrew Lamb
fd8a87484e
feat: Hook up chunk grouping into provider
2021-06-08 14:42:37 -04:00
Nga Tran
edbf1b7d5e
Merge branch 'main' into ntran/dedup_within_chunk
2021-06-08 13:18:40 -04:00
Nga Tran
40cb4f741f
feat: initial implementaton
2021-06-08 13:17:36 -04:00
Andrew Lamb
62e8675737
refactor: move primary_key calculaton to TableSummary ( #1659 )
2021-06-08 17:06:37 +00:00
Andrew Lamb
34ba268cf1
feat: Group chunks by potential overlap ( #1654 )
...
* feat: Group chunks by potential overlap
* docs: clarify in what way the calculation is conservative
* fix: Add test for mixed nulls
2021-06-08 16:55:29 +00:00
Edd Robinson
b88f277477
feat: enable not eq operator
2021-06-08 15:57:07 +01:00
Andrew Lamb
e9834a907c
feat: Prune on boolean column predicates too ( #1629 )
...
* chore: update deps to get latest DataFusion
* fix: enable boolean pruning tests
* fix: update explain plan tests
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:51:30 +00:00
Nga Tran
ff641e5638
refactor: address Andrew's comments
2021-06-06 22:36:44 -04:00
Nga Tran
2f82a9d670
feat: full foundation for deduplicate with todo functions to finish
2021-06-06 22:09:01 -04:00
Andrew Lamb
ff3215e6a9
feat: Implement Chunk Pruning ( #1567 )
2021-06-04 13:05:22 +00:00
Andrew Lamb
c986ce2c19
feat: Add pruning module to query crate ( #1611 )
...
* feat: Add pruning module
* fix: clippy
* fix: Apply suggestions from code review
* fix: remove erronious claims of DF bugs
* fix: update comments with DF bug reference
2021-06-03 11:07:26 +00:00
Nga Tran
e7a97f3ac1
test: merge main and add more tests for deduplicate work
2021-06-02 12:00:40 -04:00
Nga Tran
60ad929721
refactor: add macro tto compare output of explains
2021-06-01 16:39:14 -04:00
Nga Tran
aa867601e5
chore: merge main with DF plan display fix
2021-06-01 16:17:41 -04:00
Andrew Lamb
d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API ( #1588 )
...
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API
* fix: remove test that can no longer fail
* fix: update tests + code review comments
* fix: clippy
* fix: clippy
* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Andrew Lamb
d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files ( #1580 )
...
* refactor: use ParquetExec to read parquet files
* fix: test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb
162a808a8d
refactor: Remove `table_name` from PartitionChunk API ( #1584 )
...
* refactor: Remove `table_name` from PartitionChunk API
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 12:05:09 +00:00
Andrew Lamb
d50c7c8919
chore: remove unused dependency ( #1581 )
2021-05-31 09:58:10 +00:00
Nga Tran
62147ff0d4
feat: add more explain tests
2021-05-27 12:19:41 -04:00
Raphael Taylor-Davies
5d342d7779
feat: associate tracker with lifecycle action ( #1099 ) ( #1556 )
...
* feat: associate tracker with lifecycle action (#1099 )
* chore: docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-27 10:47:35 +00:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00
Raphael Taylor-Davies
c2fd85209c
feat: wait for task shutdown on DedicatedExecutor ( #1537 )
2021-05-25 11:33:55 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Nga Tran
0563005aac
chore: remove leftover comments
2021-05-21 17:01:49 -04:00
Nga Tran
f113abacb5
feat: more unit & e2e tests plus cleanup and addressing review comments of Andrew and Edd
2021-05-21 16:48:43 -04:00
Nga Tran
e44a3a87db
feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet
2021-05-20 16:56:15 -04:00
Nga Tran
51de37e752
chore: run fmt
2021-05-19 15:28:44 -04:00
Nga Tran
11561111d5
chore: merge main to branch
2021-05-19 15:11:15 -04:00
Nga Tran
1f13842550
chore: modify comments
2021-05-19 14:49:48 -04:00
Nga Tran
087d61f229
feat: Part 1 of predicate push down - Send predicates to MUB, RUB, and Parquet File. Note that MUB has not handled predicates yet
2021-05-19 13:59:51 -04:00
Andrew Lamb
7e223780f3
feat: Implement Display for query::predicate to improve debug printing of plans ( #1519 )
...
* feat: Implement Display for query::predicate to improve debug printing of plans
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-19 12:38:34 +00:00
Andrew Lamb
0680a5167f
chore: Improve DataFusion plan logging ( #1508 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-18 11:08:06 +00:00
Andrew Lamb
07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs ( #1500 )
2021-05-15 10:18:01 +00:00
Edd Robinson
8ccc359cab
refactor: address PR feedback
2021-05-07 13:48:44 +01:00
Edd Robinson
4c4bd2f164
refactor: update query/src/func/regex.rs
2021-05-07 13:44:51 +01:00
Edd Robinson
4cc7a99854
refactor: include not match in support check
2021-05-07 13:44:51 +01:00
Edd Robinson
beee3115f4
feat: expose regex =\~ and to gRPC API
2021-05-07 13:44:51 +01:00
Edd Robinson
eae3fec571
feat: wire up regex UDF as predicate filter expr
2021-05-07 13:44:51 +01:00
Edd Robinson
3fc2c9fc04
feat: add DataFusion regex match operator
...
This commit adds a new custom UDF to IOx that provide a regex operator to Datafusion plans.
Effectively it allows predicates to contain regex operators that are applied as filters, only allowing rows that satisfy the regex to be returned.
I did not use the Arrow regex kernel for this work because that does not return a boolean array indicating which rows matched a regex, but instead returns a new string array of results. This doesn't work well with DF's approach to filtering.
2021-05-07 13:44:51 +01:00
Carol (Nichols || Goulding)
febc1538ff
chore: Update Rust version ( #1445 )
...
* chore: Update Rust version
* refactor: Make struct constructor field orderings consistent
Sometimes I changed the struct definition, sometimes changed the struct
construction instance, depending on consistency with code around each
(other similar structs, function argument orders, etc)
More info: https://rust-lang.github.io/rust-clippy/master/index.html#inconsistent_struct_constructor
* refactor: Use flatten where appropriate
One instance is a false positive with a clippy bug.
More info:
- https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_identity
- https://rust-lang.github.io/rust-clippy/master/index.html#manual_flatten
* refactor: Use Option map instead of match
More info: https://rust-lang.github.io/rust-clippy/master/index.html#manual_map
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 22:07:10 +00:00
Raphael Taylor-Davies
44de42906f
refactor: use Arc<str> instead of Arc<String> ( #1442 )
2021-05-06 17:05:08 +00:00
Raphael Taylor-Davies
411cf134e9
refactor: explode arrow_deps ( #1425 )
...
* refactor: explode arrow_deps
* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Edd Robinson
2f789485e6
refactor: fix spelling
2021-05-05 11:06:04 +01:00
Andrew Lamb
3b7c5ac350
fix(storage rpc): do not send back tags with empty values ( #1403 )
2021-05-04 10:35:24 +00:00
Andrew Lamb
40b9b09cdc
refactor: rename assert_table_eq to assert_batches_eq ( #1368 )
2021-04-30 10:51:08 +00:00
Andrew Lamb
eb8d91cf1c
refactor: remove additional uses of RecordBatch::try_new ( #1378 )
...
* refactor: remove additional uses of RecordBatch::try_new
* fix: fix accidental change
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-30 10:24:47 +00:00