Commit Graph

550 Commits (1ddc64d68db906c6490f36d4aecde7ccd5bff945)

Author SHA1 Message Date
wiedld 02088995b2
feat(idpe 17789): compactor to scheduler communication. `update_job_status()` and `end_job()` (#8216)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.

* fix(idpe-17789): need to remove partition from uniqueness tracking, so it becomes available again

* refactor(idpe-17789): split up the single-use end_job() from the multi-use update_job_status()

* feat(idpe-17789): Commit is now a scheduler trait, only used externally in the compactor_test_utils

* feat(idpe-17789): Propagate errors pertaining to commit, in both the scheduler and the compactor.

* feat(idpe-17789): PartitionDoneSink should have different crate-private traits for scheduler versus comactor.

* feat(idpe-17789): PartitionDoneSink should propagate errors

* test(idpe-17789): integration tests suite

* test(idpe-17789): test documenting what skip request does (as outcome)

* refactor(idpe-17789): make the validate of the upgrade commit, versus replacement commit, more explicit.

* feat(idpe-17789): switch to using parking_lot Mutex within the scheduler
2023-07-24 12:01:28 -07:00
Joe-Blount acf9da2336
fix: detect empty list in compactor before assert (#8323) 2023-07-24 15:02:47 +00:00
Joe-Blount 968a0fc574
Merge branch 'main' into jrb_69_smooth_rate_limiter 2023-07-24 08:52:55 -05:00
dependabot[bot] cd31492e5b
chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 10:07:18 +00:00
Joe-Blount 7622358518 fix: avoid compacting 1 L0 to 1 L0 file (stuck looping) 2023-07-21 13:55:04 -05:00
Joe-Blount 1a62d2e4e7 chore: improve rate limiter accuracy 2023-07-21 11:22:36 -05:00
Joe-Blount 1bed99567c
chore: add DF metrics to compaction spans (#8270)
* chore: add DF metrics to compaction spans

* chore: update string for test verification

* chore: update comment

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 15:00:22 +00:00
Marco Neumann 0173c50ba1
fix: use correct error code when querier is shutting down (#8282)
When a long running query is in process and the querier is shutting
down, it might happen that the executor (= thread pool and tokio
executor responsible for the CPU-bound DataFusion execution) is shut
down while the query is running. From a "systems interaction" PoV I
think this is totally fine and I would like to avoid some weird
ref-counting. Or in other words: if the system is shutting down, shut it
down.

However the error was treated as "internal" which is not useful. The
client should rather be informed that its server was gone and that it is
OK (and desired) to retry. So as per
<https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this
should signal "unavailable".

This change wires the error code in such a way that the gRPC service
layer can properly inspect it and then changes the error mapping.

Ref https://github.com/influxdata/idpe/issues/17917 .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 12:08:22 +00:00
Joe-Blount 4c5c2473e6 chore: move scratchpad cleanup earlier 2023-07-19 10:16:28 -05:00
Joe-Blount 69e4f998dd fix: chunk up high plan counts in the compactor 2023-07-18 15:16:04 -05:00
Joe-Blount 85a9e13262
Merge branch 'main' into jrb_63_compactor_spans 2023-07-17 09:52:27 -05:00
kodiakhq[bot] ebba032399
Merge branch 'main' into cn/all-over-again 2023-07-17 14:46:48 +00:00
Carol (Nichols || Goulding) cf046d0b3e
refactor: Extract a from implementation for creating TransitionPartitionId 2023-07-17 10:34:01 -04:00
Carol (Nichols || Goulding) a9b788b58f
feat: Collate chunks based on their partition hash id if they have it 2023-07-17 10:34:01 -04:00
Joe-Blount bba0685c59 Merge remote-tracking branch 'origin/main' into jrb_64_add_test_case 2023-07-17 09:16:13 -05:00
Joe-Blount 190bc41ca4 chore: comment spelling 2023-07-17 09:14:55 -05:00
dependabot[bot] 4c0e5db3a5
chore(deps): Bump insta from 1.30.0 to 1.31.0 (#8242)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.30.0 to 1.31.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.30.0...1.31.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-17 14:01:21 +00:00
Carol (Nichols || Goulding) 10a0f8e3bf
fix: Remove ::default() when constructing unit structs
As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs
2023-07-14 10:50:55 -04:00
Joe-Blount e1e1d5ab38 fix: compactor loop in highly backlogged case 2023-07-13 16:47:59 -05:00
Joe-Blount 823a878675 Merge remote-tracking branch 'origin/main' into jrb_63_compactor_spans 2023-07-13 10:37:21 -05:00
Joe-Blount 18ef39c5fe chore: remove env variable control of compactor spans 2023-07-13 09:05:22 -05:00
Joe-Blount 803122e3b4 Merge remote-tracking branch 'origin/main' into jrb_63_compactor_spans
# Conflicts:
#	compactor/src/driver.rs
2023-07-13 08:54:22 -05:00
Joe-Blount 67343210cd chore: address comments 2023-07-13 08:51:19 -05:00
Dom Dwyer 7f7d1f2ee7
fix(ingester): projection without time column
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.

Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.
2023-07-13 14:22:48 +02:00
kodiakhq[bot] e73116a122
Merge branch 'main' into cn/query-catalog-with-either-partition-identifier 2023-07-12 14:51:02 +00:00
Andrew Lamb f33891b9fe
fix(all-in-one): Run compactor in all-in-one mode (#8214)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-11 21:14:07 +00:00
wiedld d43300635e
Revert "feat(idpe-17789): scheduler job_status() (#8202)" (#8213)
This reverts commit 3dabccd84b.
2023-07-11 10:33:56 -07:00
Joe-Blount 23aff4afc4 chore: add more useful info to compactor tracing 2023-07-11 10:42:32 -05:00
wiedld 3dabccd84b
feat(idpe-17789): scheduler job_status() (#8202)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-11 08:41:12 -07:00
Andrew Lamb b24f9c81ba
chore: Update DataFusion pin, updates for API changed (#8199) 2023-07-11 13:36:38 +00:00
Joe-Blount 16939c849d chore: add tracing to compactor 2023-07-10 16:36:24 -05:00
Carol (Nichols || Goulding) 22c17fb970
feat: Abstract over which partition ID type we're using to list Parquet files 2023-07-10 13:40:01 -04:00
Carol (Nichols || Goulding) eec31b7f00
feat: Abstract over which partition ID type we're using to get a partition from the catalog 2023-07-10 10:43:20 -04:00
Joe-Blount ec6a609f63
Merge branch 'main' into jrb_58_throttle_partition_processing 2023-07-07 08:46:22 -05:00
Joe-Blount 9f522bfd30
Revert "feat(idpe-17789): scheduler job_status() (#8121)" (#8175)
This reverts commit 5d19fa3635.
2023-07-06 18:52:25 +00:00
Joe-Blount 28eb3dcd92 feat: throttle partition evaluation in the compactor 2023-07-06 11:54:18 -05:00
wiedld 5d19fa3635
feat(idpe-17789): scheduler job_status() (#8121)
This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-06 09:15:59 -07:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00
Marco Neumann dcb4a9bb5c
refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107)
Closes #8095.
2023-06-29 11:02:48 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
wiedld 3a8a8a153e
feat(idpe 17789): provide scheduler interface (#8057)
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.

* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.

* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()

* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition

* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.

* chore: make IdOnlyPartitionFilter into only pub(crate)

* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
2023-06-28 15:04:00 -07:00
Joe-Blount ac9cc24315
fix: compactor shouldn't leave small L1s in non-overlap leading edge pattern (#8101)
* fix: compactor shouldn't leave tiny L1s with non-overlapped leading edge pattern

* chore: insta updates for prior commit
2023-06-28 17:02:21 +00:00
Joe-Blount 40865e011c
fix: compactor loop on L1 files (#8082)
* chore: suppress insta run output on some long tests

* fix: prevent L1 compaction looping

* chore: insta updates from prior commit

* chore: addresss comments
2023-06-26 21:21:24 +00:00
Joe-Blount 99d0530a21
fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8056)
* chore: adjust with_max_num_files_per_plan to more common setting

This significantly increases write amplification (see change in `written` at the conclusion of the cases)

* fix: compactor looping with unproductive compactions

* chore: formatting cleanup

* chore: fix typo in comment

* chore: add test case that compacts too many files at once

* fix: enforce max file count for compaction

* chore: insta churn from prior commit

---------

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 09:19:06 +00:00
dependabot[bot] 74a48a8f63
chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060)
* chore(deps): Bump itertools from 0.10.5 to 0.11.0

Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:11:56 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
wiedld 62251e2323
refactor(idpe-17789): delineate btwn partitions_source within the scheduler versus compactor (#8028)
This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.
2023-06-22 11:48:08 -07:00