dependabot[bot]
0cbd9f6a82
chore(deps): Bump tokio-util from 0.7.5 to 0.7.7 ( #6964 )
...
---
updated-dependencies:
- dependency-name: tokio-util
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-13 10:10:53 +00:00
Andrew Lamb
779fb93ce7
refactor: move test builders out of compactor2 code ( #6953 )
...
* refactor: move test builders out of compactor2 code
* fix: docs
2023-02-10 18:28:09 +00:00
dependabot[bot]
c0c9b51b9e
chore(deps): Bump tokio-util from 0.7.4 to 0.7.5 ( #6941 )
...
Bumps [tokio-util](https://github.com/tokio-rs/tokio ) from 0.7.4 to 0.7.5.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.4...tokio-util-0.7.5 )
---
updated-dependencies:
- dependency-name: tokio-util
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-10 09:42:00 +00:00
dependabot[bot]
0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 ( #6900 )
...
Bumps [object_store](https://github.com/apache/arrow-rs ) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4 )
---
updated-dependencies:
- dependency-name: object_store
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Raphael Taylor-Davies
d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` ( #6756 )
...
* chore: update DataFusion
* fix: test
* chore: format
* chore: clippy
* chore: update arrow
* chore: arrow upgrade fallout
* chore: Run cargo hakari tasks
* chore: remove failing warm compaction test
* fix: flight error propagation
* chore: update parquet size
* fix: Update error message
* chore: Update parquet metadata test
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding)
30fea67701
fix: Move variables within format strings. Thanks clippy!
...
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
dependabot[bot]
d0e6b16450
chore(deps): Bump bytes from 1.3.0 to 1.4.0
...
Bumps [bytes](https://github.com/tokio-rs/bytes ) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases )
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md )
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0 )
---
updated-dependencies:
- dependency-name: bytes
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 00:30:56 +00:00
dependabot[bot]
6f032b1d57
chore(deps): Bump async-trait from 0.1.63 to 0.1.64 ( #6769 )
...
Bumps [async-trait](https://github.com/dtolnay/async-trait ) from 0.1.63 to 0.1.64.
- [Release notes](https://github.com/dtolnay/async-trait/releases )
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.63...0.1.64 )
---
updated-dependencies:
- dependency-name: async-trait
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-31 10:18:27 +00:00
dependabot[bot]
ed7d02a225
chore(deps): Bump tokio from 1.24.2 to 1.25.0
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.24.2 to 1.25.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 01:57:27 +00:00
Andrew Lamb
ead6812210
fix: reduce logging verbosity ( #6704 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-27 13:53:42 +00:00
Nga Tran
b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier ( #6692 )
...
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
dependabot[bot]
0114e7ee50
chore(deps): Bump async-trait from 0.1.61 to 0.1.63 ( #6660 )
...
Bumps [async-trait](https://github.com/dtolnay/async-trait ) from 0.1.61 to 0.1.63.
- [Release notes](https://github.com/dtolnay/async-trait/releases )
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.61...0.1.63 )
---
updated-dependencies:
- dependency-name: async-trait
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-23 08:41:27 +00:00
Nga Tran
e596f5f074
chore: metrics for compaction candidate counts ( #6593 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 19:00:43 +00:00
Nga Tran
550cea8bc5
perf: optimize not to update partitions with newly created level 2 files ( #6590 )
...
* perf: optimize not to update partitions with newly created level 2 files
* chore: cleanup
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 14:46:58 +00:00
Nga Tran
fa0893819c
fix: have warm compaction work with compactor2 ( #6571 )
...
* refactor: same function to select partition candidates
* fix: have warm compaction work with compactor2
* fix: format
* chore: cleanup
2023-01-12 02:32:39 +00:00
Nga Tran
1f508b76fc
refactor: same function to select partition candidates ( #6569 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-12 01:14:49 +00:00
Nga Tran
d3b2203560
fix: bug in count processed partittions ( #6572 )
2023-01-11 22:53:52 +00:00
Nga Tran
2de0e45b0a
fix: using created_at to order chunks for deduplication ( #6556 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 18:18:33 +00:00
NGA-TRAN
2ae018e2f9
chore: merge main to branch
2023-01-10 11:55:07 -05:00
NGA-TRAN
1a93f70a8b
fix: use created_at to order L0 during comapction
2023-01-10 11:48:05 -05:00
Nga Tran
62c0f3dbdd
feat: have cold compaction work with Compactor2 ( #6542 )
...
* feat: cold
* chore: debug info
* feat: only compact qualified cold partition candidates
* fix: catalog test
* chore: cleanup
* chore: add new config flag for cold partition candidates
* chore: implement display for CompactionType and add tests for max num partitions
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 16:42:57 +00:00
dependabot[bot]
b49cc2e35e
chore(deps): Bump tokio from 1.24.0 to 1.24.1 ( #6545 )
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.24.0 to 1.24.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 09:48:44 +00:00
dependabot[bot]
e31c84a794
chore(deps): Bump async-trait from 0.1.60 to 0.1.61 ( #6533 )
...
Bumps [async-trait](https://github.com/dtolnay/async-trait ) from 0.1.60 to 0.1.61.
- [Release notes](https://github.com/dtolnay/async-trait/releases )
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61 )
---
updated-dependencies:
- dependency-name: async-trait
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-09 07:44:35 +00:00
Nga Tran
4031ea1c10
feat: integrate new way to get partition candidates for hot compaction ( #6525 )
...
* feat: integrate new way to get partition candidates for hot compaction
* chore: rename
2023-01-06 18:51:52 +00:00
Raphael Taylor-Davies
e1036a0c63
refactor: cleanup schema boxing ( #6511 )
...
* refactor: cleanup Schema boxing
* chore: clippy
2023-01-06 10:57:39 +00:00
Nga Tran
cd1a604df0
fix: make estimate memory the a query needs higher due to recent observation ( #6476 )
2022-12-30 21:15:39 +00:00
Nga Tran
0c944346e5
chore: debug info of estimate file size ( #6475 )
2022-12-30 20:13:21 +00:00
Nga Tran
d27e137c39
chore: add debug info for the investigation ( #6472 )
2022-12-29 23:49:29 +00:00
Carol (Nichols || Goulding)
39acfc4f0d
fix: Remove needless casts. Thanks clippy!
2022-12-21 14:32:34 -05:00
Dom Dwyer
adc6fcfb04
feat(catalog): linearise sort key updates
...
Updating the sort key is not commutative and MUST be serialised. The
correctness of the current catalog interface relies on the caller
serialising updates globally, something it cannot reasonably assert in a
distributed system.
This change of the catalog interface pushes this responsibility to the
catalog itself where it can be effectively enforced, and allows a caller
to detect parallel updates to the sort key.
2022-12-20 12:31:00 +01:00
kodiakhq[bot]
c0f2ba09ee
Merge branch 'main' into cn/compactor2
2022-12-19 14:22:56 +00:00
dependabot[bot]
c72734473c
chore(deps): Bump async-trait from 0.1.59 to 0.1.60 ( #6433 )
...
Bumps [async-trait](https://github.com/dtolnay/async-trait ) from 0.1.59 to 0.1.60.
- [Release notes](https://github.com/dtolnay/async-trait/releases )
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.59...0.1.60 )
---
updated-dependencies:
- dependency-name: async-trait
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-19 10:09:23 +00:00
Carol (Nichols || Goulding)
dfd979477c
fix: Update warm compaction code to optionally take shard ID
2022-12-16 17:41:57 -05:00
Carol (Nichols || Goulding)
d7e75d43ea
fix: Make shard ID optional for compactor queries in RPC write mode
2022-12-16 17:28:53 -05:00
Luke Bond
f419e2c378
feat: warm compaction ( #6192 )
...
* feat: warm compaction
chore: add missing warm compaction config
chore: tests for warm compaction
chore: modify count usage in warm compaction sql
chore: catalog test for warm compaction; sql fixes
feat: settable target level for compact w/ budget
chore: tests for warm compaction
chore: clarifying comments in warm compaction test
chore: fixed erroneous comment in catalog test
chore: improve warm compactor test by checking file exists
chore: tests for warm compaction
chore: warm compactor test tidy-ups
* chore: improve test for warm compaction
* chore: fix erroneous comment in warm compaction code
2022-12-16 15:59:45 +00:00
dependabot[bot]
1d38d400f0
chore(deps): Bump object_store from 0.5.1 to 0.5.2 ( #6339 )
...
* chore(deps): Bump object_store from 0.5.1 to 0.5.2
Bumps [object_store](https://github.com/apache/arrow-rs ) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2 )
---
updated-dependencies:
- dependency-name: object_store
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-06 07:53:54 +00:00
Nga Tran
77cbc880f6
feat: Add cap limit on number of partitions to be compacted in parallel ( #6305 )
...
* feat: Add cap limit on number of partitions to be comapcted in parallel
* chore: cleanup
* chore: clearer comments
2022-12-01 21:23:44 +00:00
Andrew Lamb
255a168d07
refactor: Refactor ParquetFileCombining into a builder and plan, and add sort exec test ( #6196 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 14:47:47 +00:00
dependabot[bot]
04c00bbb62
chore(deps): Bump bytes from 1.2.1 to 1.3.0 ( #6199 )
...
Bumps [bytes](https://github.com/tokio-rs/bytes ) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases )
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md )
- [Commits](https://github.com/tokio-rs/bytes/commits )
---
updated-dependencies:
- dependency-name: bytes
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 08:23:24 +00:00
dependabot[bot]
a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 ( #6183 )
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Luke Bond
7c813c170a
feat: reintroduce compactor first file in partition exception ( #6176 )
...
* feat: compactor ignores max file count for first file
chore: typo in comment in compactor
* feat: restore special first file in partition compaction logic; add limit
* fix: calculation in compaction max file count
chore: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 15:58:59 +00:00
Nga Tran
49a9565240
feat: gRPC that creates namespace ( #6103 )
...
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 13:02:12 +00:00
Nga Tran
80e91a644b
chore: Revert "feat: compactor ignores max file count for first file ( #6144 )" ( #6158 )
...
This reverts commit bf1681f4fe
.
2022-11-16 19:58:46 +00:00
Luke Bond
bf1681f4fe
feat: compactor ignores max file count for first file ( #6144 )
...
* feat: compactor ignores max file count for first file
* chore: typo in comment in compactor
2022-11-16 11:21:28 +00:00
Andrew Lamb
448911794c
test: test coverage for sorting and merging in compactor ( #6136 )
...
* test: test coverage for sorting and merging in compactor
* fix: Apply suggestions from code review (comments)
Co-authored-by: Marco Neumann <marco@crepererum.net>
* feat: use itertools to cover all permutations
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:39:45 +00:00
Nga Tran
9c4266c503
refactor: first step to remove unused retention_duration ( #6113 )
...
* refactor: first step to remove unused retention_duration
* refactor: remove retenion_duration from update catalog
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-11 15:21:06 +00:00
Luke Bond
f9316decee
chore: expose compactor's hot compaction hours thresholds as cfg ( #6060 )
...
* chore: expose compactor's hot compaction hours thresholds as cfg
* fix: add missing compactor arg envar; fix some comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 15:29:17 +00:00
Marco Neumann
f511db380c
refactor: remove table name from chunks ( #6063 )
...
It should be always clear from the context to which table a chunk
belongs.
I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.
Helps with #6049 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Nga Tran
654ed98d1f
feat: config param to set when partition is cold ( #6044 )
...
* feat: config param to set when partition is cold
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: make default 8 hours and avoid using 8 * 60 becasue it is a string, not expression which makes a test fail
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-03 15:03:56 +00:00
Andrew Lamb
4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` ( #6037 )
...
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`
* fix: docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Marco Neumann
45b3984aa3
refactor: simplify `QueryChunk` data access ( #6015 )
...
* refactor: simplify `QueryChunk` data access
We have only two types for chunks (now that the RUB is gone):
1. In-memory RecordBatches
2. Parquet files
Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Marco Neumann
072439e428
refactor: mandatory `QueryChunkMeta::summary` ( #5997 )
...
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈 -style `.except(...)` calls.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Carol (Nichols || Goulding)
dad1ad1318
feat: Add the catalog service to ingester, querier, and compactor
...
So that `remote get` that uses the catalog service can work no matter
what kind of server you contact.
2022-10-28 10:49:26 -04:00
Carol (Nichols || Goulding)
53445af25d
chore: Alphabetize some dependencies
...
I can't handle not knowing where to look for a dependency or knowing
where to add a new dependency.
2022-10-28 10:34:25 -04:00
Marco Neumann
8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` ( #5963 )
...
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.
While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Carol (Nichols || Goulding)
3145e2c05b
feat: Use workspace dep inheritance for the arrow crate
2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding)
44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate
2022-10-26 10:33:56 -04:00
Carol (Nichols || Goulding)
2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition
2022-10-24 13:04:09 -04:00
Nga Tran
84e5c2a0ee
fix: cardinality of each batch should use row count of the batch ( #5946 )
...
* fix: cardinality of each batch should use row count of the batch
* chore: cleanup
* fix: auto-merge conflict
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 15:36:33 +00:00
Marco Neumann
e0062f2d40
refactor: do NOT use fake DF context for parquet reading ( #5942 )
...
Use the proper top-level DataFusion context and register the object
store there.
Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.
Helps with #5897 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 08:20:26 +00:00
Jake Goulding
fa7fe2e9cf
feat: Add a gRPC endpoint to delete a skipped compaction
...
Also add a CLI usage of it for convenience
2022-10-21 15:12:20 -04:00
Carol (Nichols || Goulding)
0132a33946
fix: Rename SkippedCompactionService to CompactionService
...
To make a good place for other compactor-related gRPC actions in the
future.
2022-10-21 13:40:37 -04:00
Carol (Nichols || Goulding)
699332fd6b
fix: Actually implement the error conversion, oops
2022-10-21 13:40:31 -04:00
Carol (Nichols || Goulding)
ba25300b01
feat: Create compactor service to list skipped compactions
2022-10-21 13:40:31 -04:00
Marco Neumann
eb5a661ab3
refactor: prep work for #5897 ( #5907 )
...
* refactor: add ID to `ParquetStorage`
* refactor: remove duplicate code
* refactor: use dedicated `StorageId`
2022-10-19 11:54:42 +00:00
dependabot[bot]
b5574c07b7
chore(deps): Bump async-trait from 0.1.57 to 0.1.58 ( #5904 )
...
Bumps [async-trait](https://github.com/dtolnay/async-trait ) from 0.1.57 to 0.1.58.
- [Release notes](https://github.com/dtolnay/async-trait/releases )
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58 )
---
updated-dependencies:
- dependency-name: async-trait
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-19 09:40:26 +00:00
Andrew Lamb
d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 ( #5900 )
...
* chore: Update datafusion and `arrow` / `parquet` / `arrow-flight` 25.0.0
* chore: Update for structure changes
* chore: Update for new projection pushdown
* chore: Run cargo hakari tasks
* fix: fmt
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
kodiakhq[bot]
e6fd0bc621
Merge branch 'main' into cn/l2-file-size
2022-10-14 15:26:13 +00:00
Carol (Nichols || Goulding)
475db15a51
fix: Compact one L1 file even if it is over the file size limit
2022-10-14 10:55:34 -04:00
Andrew Lamb
8021b8be0b
fix: Use Display rather than Debug when logging errors ( #5859 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 14:43:11 +00:00
Carol (Nichols || Goulding)
b2df492558
feat: Limit L1 -> L2 compaction based on file size
2022-10-13 16:20:22 -04:00
Andrew Lamb
9134ccd6c3
chore: Update datafusion again ( #5855 )
...
* chore: Update datafusion
* chore: Updates for changes in datafusion
* chore: more updates
* fix: update doc example
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
Carol (Nichols || Goulding)
082d045633
fix: Update test compactor limit values
2022-10-13 14:25:10 -04:00
Carol (Nichols || Goulding)
cdd01eb3fc
test: Verify L1 files chosen for compaction are limited by the memory budget
2022-10-13 14:15:39 -04:00
Carol (Nichols || Goulding)
3cdf2556ec
test: Verify L1 files in a group by themselves get upgraded to L2
2022-10-13 14:15:39 -04:00
Nga Tran
fab3cd845c
feat: add memory need for output streams into our estimation ( #5847 )
...
* feat: add memory need for output streams into our estimation
* test: modify tests to have better coverage
* refactor: use constants isntead of numbers
* chore: address review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 14:31:19 +00:00
Nga Tran
1400bf99e4
refactor: split memory estimation into bytes to store and bytes to stream ( #5845 )
...
* refactor: split memory estimation into bytes to store and bytes to stream
* chore: cleanup
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:59:51 +00:00
Andrew Lamb
d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 ( #5792 )
...
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0
* fix: Update for coercion, fix explain plans for change in column name display
* chore: Update datafusion lock
* fix: Update for other API changes
* chore: Update to latest datafusion pin
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
Nga Tran
f05ca867a5
feat: add file size into estimated memory ( #5837 )
...
* feat: add file size into estimataed memory
* chore: cleanup
* chore: fmt
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: run fmt after applying review suggestion
* fix: fix tests towork with the change for review suggestion
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 14:42:53 +00:00
Nga Tran
b7153862b0
refactor: due to limit in size uplaoed to S3, we need to split output file of cold compaction, too ( #5834 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-11 17:22:19 +00:00
dependabot[bot]
933493fab3
chore(deps): Bump object_store from 0.5.0 to 0.5.1
...
Bumps [object_store](https://github.com/apache/arrow-rs ) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1 )
---
updated-dependencies:
- dependency-name: object_store
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-11 01:19:10 +00:00
Marco Neumann
c4c83e0840
fix: query error propagation ( #5801 )
...
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
semantics
Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.
Fixes #5799 .
2022-10-06 08:54:01 +00:00
Nga Tran
2f08a64f16
feat: not split output files in the first step of cold compaction ( #5781 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 16:08:03 +00:00
Dom Dwyer
cd4087e00d
style: add no todo!() or dbg!() lints
...
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
Andrew Lamb
66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 ( #5694 )
...
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0
* chore: Update thrift / remove parquet_format
* fix: Update APIs
* chore: Update lock + Run cargo hakari tasks
* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Nga Tran
75ff805ee2
feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily ( #5742 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-26 20:14:04 +00:00
Nga Tran
b11da1d98b
fix: a silly bug that did not capture file limit if a lot of L0 files and very few or non overlapped L1 ( #5736 )
2022-09-23 21:03:29 +00:00
Nga Tran
c4542d6b21
chore: more verbose about the memory budget inserted in to the catalog table skipped_comapction ( #5735 )
2022-09-23 18:40:09 +00:00
Nga Tran
bb7df22aa1
chore: always use a fixed number of rows (8192) per batch to estimate memory ( #5733 )
2022-09-23 15:51:25 +00:00
Nga Tran
da697815ff
chore: add more info about memory budget at the time of over-file-limit into skipped_compaction for us to see if we shoudl increase the file limit ( #5731 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-23 13:34:38 +00:00
Nga Tran
61075d57e2
chore: turn full cold compaction on ( #5728 )
2022-09-22 17:07:35 +00:00
Nga Tran
aaec5104d6
chore: turn compaction cold partition step 1 on to work with our new … ( #5726 )
...
* chore: turn compaction cold partition step 1 on to work with our new memory budget that considers the num_files limitation
* chore: run fmt
2022-09-22 14:59:27 +00:00
Nga Tran
e3deb23bcc
feat: add minimum row_count per file in estimating compacting memory… ( #5715 )
...
* feat: add minimum row_count per file in estiumating compacting memory budget and limit number files per compaction
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* test: add test per review comments
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* test: add one more test that has limit num files larger than total input files
* fix: make the L1 files in tests not overlapped
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 14:37:39 +00:00
Carol (Nichols || Goulding)
aa822a40cf
refactor: Move config in with the relevant assertions
...
Now that only one hot test is using a CompactorConfig, move it into that
test to avoid spooky action at a distance.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
f0bf3bd21c
test: Clarify descriptions for the remaining assertion
...
The assertion remaining in this test is now important because of having
multiple shards and showing which partition per shard is chosen.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
7c7b058276
refactor: Extract unit test for case 5
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
f5bd81ff3c
refactor: Extract unit test for case 4
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
765feaa4d8
refactor: Extract a unit test for case 3
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
a7a480c1ba
refactor: Extract a unit test for case 2
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
d95f252a8e
refactor: Extract a unit test for case 1
...
Also add coverage for when there are no *partitions* in addition to the
test for when there are no *parquet files*.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
9372290ec9
refactor: Use iox_test helpers to simplify test setup
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
f22627a97f
test: Move an integration test of hot compact_one_partition to lib
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
a7bb0398e6
test: Move an integration test of compact_candidates_with_memory_budget to the same file
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
316ebfa8c1
test: Call the smaller inner hot_partitions_for_shard when only one shard is involved
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
fcf9a9d589
refactor: Move fetching of config from compactor inside hot_partitions_to_compact
...
But still pass them to hot_partitions_for_shard.
And make the order of the arguments the same as for
recent_highest_throughput_partitions because I've already messed the
order up. And make the names the same throughout.
This makes the closure passed to get_candidates_with_retry simpler.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
48b7876174
refactor: Extract a function for computing query nanoseconds ago
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
7dcaf5bd3d
refactor: Extract a function for getting hot partitions for one shard
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
b557c30fd3
refactor: Move hot compaction candidates to the hot module
2022-09-21 11:57:55 -04:00
Carol (Nichols || Goulding)
fa11031a36
refactor: Extract a shared function to retry fetching of compaction candidates
2022-09-21 11:57:55 -04:00
Nga Tran
1d306061b9
chore: disable cold compaction again since its step 1 is the culprit ( #5700 )
2022-09-20 20:34:28 +00:00
Nga Tran
34bc02b59b
chore: turn cold comapction on but only compact L0s and thier overlapped L1s ( #5698 )
2022-09-20 18:44:36 +00:00
Nga Tran
578ce1854d
chore: temporarily turn off cold compaction to investigate an oom ( #5696 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-20 14:17:22 +00:00
Carol (Nichols || Goulding)
414b0f02ca
fix: Use time helper methods in more places
2022-09-19 13:24:08 -04:00
Carol (Nichols || Goulding)
c0c0349bc5
fix: Use typed Time values rather than ns
2022-09-19 12:59:20 -04:00
Carol (Nichols || Goulding)
0e23360da1
refactor: Add helper methods for computing times to TimeProvider
2022-09-19 11:34:43 -04:00
kodiakhq[bot]
eed31bec4e
Merge branch 'main' into cn/share-code-with-full-compaction
2022-09-16 21:15:44 +00:00
Carol (Nichols || Goulding)
20f5f205bc
fix: ChunkOrder should be either max_seq or 0, not min_time
2022-09-16 16:57:31 -04:00
Carol (Nichols || Goulding)
d85e959820
fix: Sort l1 files by min_time rather than max_sequence_number
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
50ddd588b1
test: Add a case of L1+L2 files being compacted into L2
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
a8d817c91a
test: Explain expected value
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
1ab250dfac
fix: Sort chunks taking into account what level compaction is targetting
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
ca4c5d65e7
docs: Clarify comments on sort order of input/output of filtering
2022-09-16 16:15:17 -04:00
Nga Tran
346ef1c811
chore: reduce number of histogram buckets ( #5661 )
2022-09-16 19:44:22 +00:00
Carol (Nichols || Goulding)
cde0a94fd5
fix: Re-enable full compaction to level 2
...
This will work the same way that compacting level 0 -> level 1 does
except that the resulting files won't be split into potentially multiple
files. It will be limited by the memory budget bytes, which should limit
the groups more than the max_file_size_bytes would.
2022-09-15 14:53:12 -04:00
Carol (Nichols || Goulding)
e05657e8a4
feat: Make filter_parquet_files more general with regards to compaction level
2022-09-15 14:53:08 -04:00
Carol (Nichols || Goulding)
9b99af08e4
fix: Level 1 files need to be sorted by max sequence number for full compaction
2022-09-15 14:53:07 -04:00
Carol (Nichols || Goulding)
dc64e494bd
docs: Update comment to what we'd like this code to do
2022-09-15 14:53:07 -04:00
Carol (Nichols || Goulding)
f5497a3a3d
refactor: Extract a conversion for convenience in tests
2022-09-15 12:48:36 -04:00
Carol (Nichols || Goulding)
dcab9d0ffc
refactor: Combine relevant data with the FilterResult state
...
This encodes the result directly and has the FilterResult hold only the
relevant data to the state. So no longer any need to create or check for
empty vectors or 0 budget_bytes. Also creates a new type after checking
the filter result state and handling the budget, as actual compaction
doesn't need to care about that.
This could still use more refactoring to become a clearer pipeline of
different states, but I think this is a good start.
2022-09-15 11:13:18 -04:00
Carol (Nichols || Goulding)
e57387b8e4
refactor: Extract an inner function so partition isn't needed in tests
2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding)
a284cebb51
refactor: Store estimated bytes on the CompactorParquetFile
2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding)
70094aead0
refactor: Make estimating bytes a responsibility of the Partition
...
Table columns for a partition don't change, so rather than carrying
around table columns for the partition and parquet files to look up
repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track
of its column types and be able to estimate bytes given a number of rows
from a parquet file.
2022-09-15 11:10:14 -04:00
Nga Tran
7c4c918636
chore: add parttion id into panic message ( #5641 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 02:21:13 +00:00
kodiakhq[bot]
08e2523295
Merge branch 'main' into cn/always-get-extra-info
2022-09-14 17:01:59 +00:00
Nga Tran
44e12aa512
feat: add needed budget and memory budget into the message for us to diagnose and increase our memory budget as needed ( #5640 )
2022-09-14 16:06:19 +00:00
Carol (Nichols || Goulding)
e16306d21c
refactor: Move fetching of extra partition info into the method because it's always needed
2022-09-14 11:14:17 -04:00
kodiakhq[bot]
85641efa6f
Merge branch 'main' into cn/infallible-estimated-bytes
2022-09-14 01:00:10 +00:00
Nga Tran
f21cb43624
feat: add a few more buckets for the histograms ( #5621 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 13:52:23 +00:00
Andrew Lamb
f86d3e31da
chore: Update datafusion + object_store ( #5619 )
...
* chore: Update datafusion pin
* chore: update object_store to 0.5.0
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 12:34:54 +00:00
Carol (Nichols || Goulding)
d971980fd3
fix: Box a source error to please clippy
2022-09-12 17:38:40 -04:00
Carol (Nichols || Goulding)
c3937308f4
fix: Make estimate_arrow_bytes_for_file infallible
2022-09-12 16:50:25 -04:00
Andrew Lamb
1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 ( #5591 )
...
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0
* fix: enable dynamic comparison flag
* chore: derive Eq for clippy
* chore: update explain plans
* chore: Update sizes for ReadBuffer encoding
* chore: update more tests
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Carol (Nichols || Goulding)
e7a3f15ecf
test: Remove outdated description
2022-09-12 13:13:30 -04:00
Carol (Nichols || Goulding)
8981cbbd84
test: Reduce time from 18 to 9 hours
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
2ceb779c28
test: Correct a comment that I missed in the 24 hr -> 8 hr switch
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
baec40a313
test: Correct and expand assertions and descriptions
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
2aef7c7936
feat: Temporarily disable cold full compaction
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
743b67f0e9
fix: Re-enable full cold compaction, in serial for now
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
6e1b06c435
fix: Work with Arc of PartitionCompactionCandidateWithInfo
2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding)
dfd7255c46
fix: Remove now-unused cold_input_file_count_threshold
2022-09-12 13:13:28 -04:00