Commit Graph

74 Commits (0a0afa9485e49e92a68c59060d5393587fc12843)

Author SHA1 Message Date
Marco Neumann 3c968ac092 feat: correctly account MUB sizes
Fixes #1565.
2021-09-03 09:15:49 +02:00
Marco Neumann 79ad48ac3a chore: rename "labels" to "attributes" 2021-08-31 11:31:15 +02:00
Raphael Taylor-Davies e3e801d29a
feat: propagate span context into storage RPC queries (#2407)
* feat: propagate span context into storage RPC queries

* refactor: create ExecutionContextProvider trait

* chore: cleanup imports

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-26 17:11:49 +00:00
Andrew Lamb f975baba6b
chore: Update datafusion + other deps again (get baseline metrics) (#2422)
* chore: Update datafusion reference

* chore: cargo update

* fix: update explain tests to show Union

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-26 13:13:00 +00:00
kodiakhq[bot] b1ecf1bfed
Merge branch 'main' into crepererum/job_start_time_in_system_table 2021-08-26 08:04:10 +00:00
Andrew Lamb ddf6c6362e
chore: update DataFusion again (#2411)
* chore: update datafusion ref

* chore: run cargo update

* refactor: Rename concurrency to target_partitions, avoid deprecation warning
2021-08-26 08:03:13 +00:00
Marco Neumann 558aa54aa3 feat: add start time to `operations` system table 2021-08-26 10:00:29 +02:00
Edd Robinson 11e88877f4 fix: correct size estimation of RLE encoding 2021-08-25 12:03:04 +01:00
Marco Neumann 2ad9843e5f feat: make `RLE` a bit smaller by capacity-based allocation
For some demo data this reduced the overall chunk size from

195049367 bytes
to
191088095 bytes
2021-08-25 10:22:43 +02:00
Raphael Taylor-Davies f7792aafe6
feat: query tracing (#2273) (#2391)
* feat: query tracing (#2273)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-24 17:35:59 +00:00
Raphael Taylor-Davies a6c9cc2bf2
refactor: rework exec module (#2384)
* refactor: rework exec module

* chore: update docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-24 08:39:54 +00:00
Raphael Taylor-Davies 0946ffe916
refactor: reuse IOxExecutionContext (#2373)
* refactor: reuse IOxExecutionContext

* fix: orphaned comment

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-23 15:47:15 +00:00
Edd Robinson b9f09fce49 feat: improve bitset size estimation 2021-08-17 22:54:22 +01:00
Edd Robinson 1daa30cc7d fix: include enum in sizing 2021-08-17 22:54:22 +01:00
Edd Robinson 311d36d776 refactor: include capacity in Read Buffer chunk size 2021-08-13 11:57:46 +01:00
Edd Robinson fa8da19c45 refactor: expose enc size API into column 2021-08-13 11:57:46 +01:00
Edd Robinson c68bbb6309 test: update test 2021-08-12 15:05:47 +01:00
Andrew Lamb bb8021d9fd
fix: "Can not convert index to usize in dictionary of type creating group by value Int32" (#2151)
* test: add reproducer for index error

* chore: update datafusion
2021-08-02 12:20:41 +00:00
Carol (Nichols || Goulding) 9d15798288 fix: Address or allow Clippy warnings new with Rust 1.54 2021-07-30 09:59:59 -04:00
Andrew Lamb e6cbd4d217
feat: Use statistics for count(*) queries (#2038)
* feat: Use statistics for count(*) queries

* docs: fix mangled comment

* refactor: rewrite to use fold

* refactor: use sort_by_cached_key

* fix: set null count properly

* fix: fmt + clippy
2021-07-28 19:39:41 +00:00
Andrew Lamb 3ea84c6be4
feat: expose null_counts in system.chunk_columns (#2105)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-27 11:05:23 +00:00
Andrew Lamb 5fb3e00f2a
fix: Properly record total_count and null_count in statistics (#2103)
* fix: Properly record total_count and null_count in statistics

* fix: fix statistics calculation in mutable_buffer

* refactor: expose null counts in read_buffer

* refactor: expose null_count in parquet_file

* fix: update server crate tests

* fix: update query_tests tests

* docs: tweak comments

* refactor: Use storage_stats rather than adding `null_count`

* refactor: rename test data field for clarity

* fix: fixup merge conflicts

* refactor: rename initial_non_null_count to initial_total_count

* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Marco Neumann 6ef3680554 feat: collect replay plan during catalog loading 2021-07-23 09:23:06 +02:00
Andrew Lamb 38261cc7ac
test: add tests using `to_timestamp()` as predicates in SQL (#2099)
* test: add tests using `to_timestamp()` as predicates in SQL

* fix: cleanup redundancy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-22 21:06:52 +00:00
Andrew Lamb 01c79f1a1a
fix: Print all timestamps using RFC3339 format (#2098)
* fix: Use IOx pretty printer rather than arrow pretty printer

* chore: update tests in the query crate

* chore: update influxdb_iox tests

* chore: Update end to end tests

* chore: update query_tests

* chore: update mutable_buffer tests

* refactor: update parquet_file tests

* refactor: update db tests

* chore: update kafka integration test output

* fix: merge conflict
2021-07-22 19:04:52 +00:00
Raphael Taylor-Davies 20d06e3225
feat: include more information in system.operations table (#2097)
* feat: include more information in system.operations table

* chore: review feedback

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-22 17:16:09 +00:00
Andrew Lamb 387667330a
chore: Update datafusion deps (#2073)
* chore: Update datafusion deps

* fix: update tests
2021-07-21 08:27:03 +00:00
Raphael Taylor-Davies 091837420f
feat: add PersistenceWindows sytem table (#2030) (#2062)
* feat: add PersistenceWindows sytem table (#2030)

* chore: update log

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 13:10:57 +00:00
Andrew Lamb 1c16988a51
chore: Update datafusion references (#2056) 2021-07-19 18:09:06 +00:00
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Marco Neumann 2263189e09 test: make TestDb lifecycle better for testing
This is a leftover from #1972.
2021-07-19 09:50:44 +02:00
Marco Neumann 1ef2bc1887 refactor: `Db::{write_chunk_to_object_store => Db::persist_partition}`
The previous method allowed to persist any chunk -- even ones that
should not be persisted yet and w/o any order of peristence. That will
break our persistence windows. So instead offer a sane higher-level
interface that can trigger persistence of a partition within the
boundaries of the lifecycle rules. This needs some adjustments for our
test suite.
2021-07-16 12:07:58 +02:00
Andrew Lamb 3fd6430fb6
fix: rename `estimated_bytes` to `memory_bytes` and expose `object_store_bytes` in ChunkSummary and system.chunks (#2017)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-15 16:00:24 +00:00
Andrew Lamb 3bb32594ba
refactor: rename end-to-end.rs to end_to_end.rs (#2015)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-15 13:50:32 +00:00
Marco Neumann d89fca00be feat: persist "drop chunk" 2021-07-15 12:07:56 +02:00
Nga Tran 0b1f2b1fd0 chore: merge main to branch 2021-07-14 16:17:14 -04:00
Nga Tran 552e3fb691 fix: Padd stats compute deterministic order of sort key and update tests that got changed by the use of sort key 2021-07-14 14:06:41 -04:00
Andrew Lamb 4800b36949 chore: Update IOx to a pre-release version of arrow and datafusion to test out performance improvement 2021-07-13 15:44:57 -04:00
Andrew Lamb 0164cabbf3
refactor: do not use DataFrame DataFusion API / stop optimizing twice (#1982)
* refactor: do not use DataFrame DataFusion API

* fix: update output to reflect not running optimizer twice

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 16:29:43 +00:00
kodiakhq[bot] f26f844ed2
Merge branch 'main' into ntran/use_sortkey 2021-07-12 18:12:47 +00:00
Nga Tran 7b7a60993d feat: consider time as a special key 2021-07-09 18:54:22 -04:00
Marco Neumann 676034b4ae docs: explain why the path placeholder is there 2021-07-09 09:45:13 +02:00
Marco Neumann 09e611deb7 refactor: lift query schema generation up to caller
Do no longer scan chunks during query planning to determine the schema
(except for the lifetime jobs where we have a good reason to do so).
Instead pass the schema down to from whoever is triggering the query.
For real SQL queries, we then just use the the table-wide schemas
introduced in #1913.

Apart from avoiding schema merges we now also don't crash any longer
when no chunks are left in the table (aka columns are present but all
rows are gone).

Fixes #1768.
Fixes #1884.
2021-07-09 09:24:21 +02:00
Marco Neumann 6ac1420335 test: fix out dir for query tests 2021-07-09 09:16:28 +02:00
kodiakhq[bot] c8126784a8
Merge branch 'main' into ntran/avoid_sort_in_scan 2021-07-08 20:22:18 +00:00
Nga Tran da6249a4df fix: address reviewers' comments and also fixe a bug they discovered 2021-07-08 15:54:54 -04:00
Andrew Lamb dd3eff7748
refactor: Always use `row_count` for count of rows in system.* tables (#1937) 2021-07-08 19:28:11 +00:00
Andrew Lamb f670224ea1
chore: Reduce output spew during query tests (#1926)
* chore: Reduce output spew during query tests

* docs: Update query_tests/src/runner.rs

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 11:06:24 +00:00
Andrew Lamb 7602bde850
chore: Update datafusion deps (#1799)
* chore: Update datafusion deps + rework code

* refactor: remove workaround as it has been contributed upstream

* fix: Update query/src/exec/split.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Nga Tran 5c722af0fa fix: remove comments 2021-07-07 16:50:53 -04:00