Commit Graph

34 Commits (65230040083eeca3d8a91905567e7341bd671103)

Author SHA1 Message Date
Joe-Blount 53915f0653
feat: move vertical splitting & detect non-linear data (#8506)
* chore: test changes and additions in preparation for functional changes

* feat: move vertical splitting to RoundInfo calculation, align splits to L1 files

* chore: insta test churn

* feat: detect non-linear data distribution in vertical splitting

* chore: add tests for non-linear data distribution

* chore: insta churn

* chore: cleanup & comment additions

* chore: some variable renaming
2023-08-21 18:22:25 +00:00
Nga Tran 3e98f7ea5c
feat: fill sort_key_ids when partition is inserted and updated (#8517)
* feat: read null sort_key_ids

* chore: clearer explanation about test strategy

* chore: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* test: tests that add partition with NULL sort_key_ids

* feat: set sort_key_ids to empty array {} during partition insertion

* feat: initial step to update sort_key_ids

* chore: address review comments

* chore: remove unecessary comments and tests

* fix: typos

* chore: remove unecessary tests

* feat: continue the work of updating sort_key_ids

* fix: chec duplicates for SortedColumnSet

* test: tests for sort ley ids

* test: fix a test

* chore: remove unused comments

* chore: address first half of review comments and removing tests of tests

* chore: address review commnets for fetching colums in ingester

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-21 14:26:57 +00:00
Joe-Blount 1cc0926a7f
feat: track why bytes are written in compactor simulator (#8493)
* feat: add tracking of why bytes are written in simulator

* chore: enable breakdown of why bytes are written in a few larger tests

* chore: enable writes breakdown in another test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-16 13:39:37 +00:00
Joe-Blount 964b2f6b97
fix: compactor simulator math error creates 0 byte files (#8478)
* fix: math error in simulator results in 0 byte files during simulations

* chore: insta churn from simulator file size fix
2023-08-14 20:00:19 +00:00
dependabot[bot] 4c63338354
chore(deps): Bump async-trait from 0.1.72 to 0.1.73 (#8481)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.72 to 0.1.73.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.72...0.1.73)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-14 06:25:33 +00:00
NGA-TRAN 9bf1c8c11c chore: revert fill sort_key_ids 2023-08-11 11:36:27 -04:00
Nga Tran da92a5c9e1
feat: fill catalog `sort_key_ids` for partitions with coming data (#8462)
* feat: fill catalog sort_key_ids for partition with coming data

* test: sort_key_ids has empty array for newly create partition

* test: name of non-existing column

* chore: add comments to ask Andrew about the code

* chore: make comments clearer

* chore: fix a comment to avoid failure in doc

* chore: add comment for the panic if column name of sort key not found

* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key

* chore: remove no longer needed comments after the bug in build_catalog test is fixed

* chore: address review comments

* refactor: Use ColumnSet type

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: fix a clippy

---------

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-08-10 18:12:40 +00:00
wiedld 81b5d80a91
feat(idpe-17935): move filtering of skipped partitions to the scheduler (#8358)
* catalog.get_in_skipped_compaction() should handle for multiple partitions

* add the ability to perform transformation on sets of partitions (rather than filtering one by one). Start with the transformation to remove skipped partitions, in the scheduler.

* move the env var and cli flag setting, for when to ignore skipped partitions, to the scheduler config.
2023-08-03 11:43:09 -07:00
Carol (Nichols || Goulding) 5db8ed677f
fix: Update compactor's use of QueryChunk to return TransitionPartitionId
Also rename PartitionInfo's transition_partition_id to be partition_id
so that it's consistent with the QueryChunk method. We might want to
rename the partition_id field to catalog_partition_id, but for now I
think the types will make compactor usage clear enough.

This gets compactor to compile and pass its tests.
2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
wiedld 02088995b2
feat(idpe 17789): compactor to scheduler communication. `update_job_status()` and `end_job()` (#8216)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.

* fix(idpe-17789): need to remove partition from uniqueness tracking, so it becomes available again

* refactor(idpe-17789): split up the single-use end_job() from the multi-use update_job_status()

* feat(idpe-17789): Commit is now a scheduler trait, only used externally in the compactor_test_utils

* feat(idpe-17789): Propagate errors pertaining to commit, in both the scheduler and the compactor.

* feat(idpe-17789): PartitionDoneSink should have different crate-private traits for scheduler versus comactor.

* feat(idpe-17789): PartitionDoneSink should propagate errors

* test(idpe-17789): integration tests suite

* test(idpe-17789): test documenting what skip request does (as outcome)

* refactor(idpe-17789): make the validate of the upgrade commit, versus replacement commit, more explicit.

* feat(idpe-17789): switch to using parking_lot Mutex within the scheduler
2023-07-24 12:01:28 -07:00
dependabot[bot] cd31492e5b
chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 10:07:18 +00:00
Joe-Blount 85a9e13262
Merge branch 'main' into jrb_63_compactor_spans 2023-07-17 09:52:27 -05:00
dependabot[bot] 4c0e5db3a5
chore(deps): Bump insta from 1.30.0 to 1.31.0 (#8242)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.30.0 to 1.31.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.30.0...1.31.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-17 14:01:21 +00:00
wiedld d43300635e
Revert "feat(idpe-17789): scheduler job_status() (#8202)" (#8213)
This reverts commit 3dabccd84b.
2023-07-11 10:33:56 -07:00
wiedld 3dabccd84b
feat(idpe-17789): scheduler job_status() (#8202)
* feat(idpe-17789): scheduler job_status() (#8121)

This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-11 08:41:12 -07:00
Joe-Blount 16939c849d chore: add tracing to compactor 2023-07-10 16:36:24 -05:00
Joe-Blount 9f522bfd30
Revert "feat(idpe-17789): scheduler job_status() (#8121)" (#8175)
This reverts commit 5d19fa3635.
2023-07-06 18:52:25 +00:00
wiedld 5d19fa3635
feat(idpe-17789): scheduler job_status() (#8121)
This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-06 09:15:59 -07:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
wiedld 3a8a8a153e
feat(idpe 17789): provide scheduler interface (#8057)
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.

* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.

* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()

* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition

* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.

* chore: make IdOnlyPartitionFilter into only pub(crate)

* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
2023-06-28 15:04:00 -07:00
Joe-Blount ac9cc24315
fix: compactor shouldn't leave small L1s in non-overlap leading edge pattern (#8101)
* fix: compactor shouldn't leave tiny L1s with non-overlapped leading edge pattern

* chore: insta updates for prior commit
2023-06-28 17:02:21 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Dom Dwyer d1cbbd27b1
feat(compactor): config partition query rate limit
Allow the partition fetch queries to be (optionally) rate limited via
runtime config.
2023-06-21 15:50:12 +02:00
wiedld e29b453e0d
refactor: move PartitionsSourceConfig into local scheduler (#8026) 2023-06-20 16:05:59 -07:00
wiedld 7a1f54ac64
refactor: remove compactor type (#8011)
* refactor: remove cold compactions
* refactor: remove compaction_type
2023-06-16 09:40:13 -07:00
Joe-Blount 5d0bb68c5b
chore: add compactor option to disable scratchpad (#7995) 2023-06-15 14:35:55 +00:00
Andrew Lamb 1ff76b7bf2 chore: use workspace dependencies for `object_store` 2023-05-26 07:03:42 -04:00
Marco Neumann 103e814f22
refactor: clean up catalog `parquet_files` interface (#7853)
* feat: `ParquetFileRepo::list_all`

* refactor: remove `ParquetFileRepo::list_by_table`

* refactor: simlify `ParquetFileRepo::list_by_table`

* refactor: remove `ParquetFileRepo::count`

* refactor: remove `ParquetFileRepo::update_compaction_level`

* refactor: remove `ParquetFileRepo::exists`

* fix: test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-24 09:15:03 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` (#7822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Carol (Nichols || Goulding) 9229ce5668
fix: Rename compactor2_test_utils to compactor_test_utils 2023-05-09 11:02:11 +02:00