Commit Graph

32 Commits (be9064c75fc912c93859f180a17afa260c6ac242)

Author SHA1 Message Date
dependabot[bot] 7094189004
chore(deps): Bump tokio from 1.31.0 to 1.32.0 (#8507)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.31.0 to 1.32.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.31.0...tokio-1.32.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-17 08:06:29 +00:00
dependabot[bot] 34b8585931
chore(deps): Bump tokio from 1.30.0 to 1.31.0 (#8482)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.30.0 to 1.31.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.30.0...tokio-1.31.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-14 06:32:34 +00:00
dependabot[bot] 4c63338354
chore(deps): Bump async-trait from 0.1.72 to 0.1.73 (#8481)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.72 to 0.1.73.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.72...0.1.73)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-14 06:25:33 +00:00
dependabot[bot] 3675043585
chore(deps): Bump tokio from 1.29.1 to 1.30.0 (#8464)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.29.1 to 1.30.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.29.1...tokio-1.30.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-10 07:50:18 +00:00
wiedld 81b5d80a91
feat(idpe-17935): move filtering of skipped partitions to the scheduler (#8358)
* catalog.get_in_skipped_compaction() should handle for multiple partitions

* add the ability to perform transformation on sets of partitions (rather than filtering one by one). Start with the transformation to remove skipped partitions, in the scheduler.

* move the env var and cli flag setting, for when to ignore skipped partitions, to the scheduler config.
2023-08-03 11:43:09 -07:00
wiedld 68ab2c97c1
feat(idpe 17789): provide job to CompactionJobDoneSink (formerly known as PartitionDoneSink) (#8368)
* rename PartitionDoneSink to CompactionJobSink. and change signature in trait
* update all trait implementations, including local variables and comments
* rename partition_done_sink in the components and driver, to be compaction_job_done_sink
2023-08-02 11:19:50 -07:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
wiedld bab6f239ea feat(idpe-17789): move Compactor abstractions PartitionsSource and PartitionStream to use CompactionJob 2023-07-24 18:40:04 -07:00
wiedld 817bd595ca chore(idpe-17789): make PartitionsSource be crate private for scheduler 2023-07-24 14:44:40 -07:00
wiedld 02088995b2
feat(idpe 17789): compactor to scheduler communication. `update_job_status()` and `end_job()` (#8216)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.

* fix(idpe-17789): need to remove partition from uniqueness tracking, so it becomes available again

* refactor(idpe-17789): split up the single-use end_job() from the multi-use update_job_status()

* feat(idpe-17789): Commit is now a scheduler trait, only used externally in the compactor_test_utils

* feat(idpe-17789): Propagate errors pertaining to commit, in both the scheduler and the compactor.

* feat(idpe-17789): PartitionDoneSink should have different crate-private traits for scheduler versus comactor.

* feat(idpe-17789): PartitionDoneSink should propagate errors

* test(idpe-17789): integration tests suite

* test(idpe-17789): test documenting what skip request does (as outcome)

* refactor(idpe-17789): make the validate of the upgrade commit, versus replacement commit, more explicit.

* feat(idpe-17789): switch to using parking_lot Mutex within the scheduler
2023-07-24 12:01:28 -07:00
wiedld bf1e28ba96
refactor: implementations of PartitionsSource in compactor_scheduler should use the parking lot mutex (#8309) 2023-07-24 11:29:20 -07:00
Marco Neumann edf77c73d8
fix: avoid panic when clock goes backwards (#8322)
I've seen at least one case in prod where the UTC clock goes backwards.
The `TimeProvider` and `Time` interface even warns about that. However
there was a `Sub` impl that would panic if that happens and even though
this was documented, I think we can do better and just not offer a
panicky interface at all.

So this removes the `Sub` impl. and replaces all uses with
`checked_duration_since`.
2023-07-24 12:10:41 +00:00
dependabot[bot] cd31492e5b
chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 10:07:18 +00:00
wiedld d43300635e
Revert "feat(idpe-17789): scheduler job_status() (#8202)" (#8213)
This reverts commit 3dabccd84b.
2023-07-11 10:33:56 -07:00
wiedld 3dabccd84b
feat(idpe-17789): scheduler job_status() (#8202)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-11 08:41:12 -07:00
Joe-Blount ec6a609f63
Merge branch 'main' into jrb_58_throttle_partition_processing 2023-07-07 08:46:22 -05:00
Joe-Blount 65f928f227
chore: unwrap option in log (#8178) 2023-07-06 23:01:39 +00:00
Joe-Blount afb899b204
chore: add logging when compactor fetches partitions (#8176) 2023-07-06 21:51:04 +00:00
Joe-Blount 9f522bfd30
Revert "feat(idpe-17789): scheduler job_status() (#8121)" (#8175)
This reverts commit 5d19fa3635.
2023-07-06 18:52:25 +00:00
Joe-Blount 28eb3dcd92 feat: throttle partition evaluation in the compactor 2023-07-06 11:54:18 -05:00
wiedld 5d19fa3635
feat(idpe-17789): scheduler job_status() (#8121)
This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-06 09:15:59 -07:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
Joe-Blount 38cfc58cc0 fix: don't redundantly evaluate the same partitions for compaction 2023-07-05 09:27:31 -05:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
Joe-Blount 2dacd0e7df
feat: conditionally expand lookback for active partitions to avoid skipping any (#8112)
* feat: conditionally expand lookback for active partitions to avoid skipping any

* chore: comment adjustments

* chore: rust fmt
2023-06-29 20:49:35 +00:00
wiedld 3a8a8a153e
feat(idpe 17789): provide scheduler interface (#8057)
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.

* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.

* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()

* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition

* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.

* chore: make IdOnlyPartitionFilter into only pub(crate)

* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
2023-06-28 15:04:00 -07:00
dependabot[bot] b15c6062a9
chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 13:18:08 +00:00
wiedld 62251e2323
refactor(idpe-17789): delineate btwn partitions_source within the scheduler versus compactor (#8028)
This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.
2023-06-22 11:48:08 -07:00
wiedld f75736891a
refactor(idpe-17789): move mod id_only_partition_filter (#8027)
* chore: delineate scheduler logic boundary in code comments
* refactor: move id_only_partition_filter mod into local scheduler
* chore: add docs for each IdOnlyPartitionFilter implementation
2023-06-21 10:29:38 -07:00
wiedld e29b453e0d
refactor: move PartitionsSourceConfig into local scheduler (#8026) 2023-06-20 16:05:59 -07:00
wiedld 34b5fadde0
refactor: move scheduler related configs to compactor_scheduler (#8013) 2023-06-20 09:55:35 -07:00
wiedld 8b7ef69f6f
refactor: move partitions_source to scheduler (#8010)
* refactor: make compactor_scheduler crate

* refactor: move PartitionsSource into the compactor_scheduler

    The compactor currently uses PartitionsSource in two ways:
       * for the preparation of PartitionIds prior to the compactor pipeline.
       * for the abstraction which utilize the PartitionIds during the IO pipeline.
    This commit is a refactoring to enable us to delineate between these two utilizations.
    The former (preparation) utilization will now be done in the compactor_scheduler.
    Since the compactor is dependent on the compactor_scheduler, it made sense to move the trait to the scheduler.
2023-06-16 10:02:13 -07:00