influxdb

Commit Graph

Author	SHA1	Message	Date
wiedld	d7fee9fdb8	refactor(idpe-17789): rename local variables and private struct properties to jobs (versus partitions).	2023-07-27 15:38:21 -07:00
wiedld	78ef536954	Merge branch 'main' into idpe-17789/compaction-job-renaming	2023-07-27 15:05:49 -07:00
Joe-Blount	f5a41592da	Merge remote-tracking branch 'origin/main' into jrb_73_stuck # Conflicts: # compactor/tests/layouts/stuck.rs	2023-07-27 08:54:50 -05:00
Joe-Blount	525f8ec0cb	fix: compactor loop splitting then undoing it (#8338 )	2023-07-27 13:17:30 +00:00
Joe-Blount	6246275c4a	chore: insta churn updates	2023-07-26 15:59:49 -05:00
Joe-Blount	f1e088aa0e	fix: no percent split during ManySmallFiles compaction	2023-07-26 15:59:20 -05:00
wiedld	4dd73a036b	refactor(idpe-17789): rename remaining references (in methods and report output) to be compaction_job_stream	2023-07-24 18:40:15 -07:00
wiedld	8abce6b054	refactor(idpe-17789): rename partition_stream mod to compaction_job_stream mod	2023-07-24 18:40:15 -07:00
wiedld	74e48dccb8	refactor(idpe-17789): rename OncePartititionStream to OnceCompactionJobStream	2023-07-24 18:40:15 -07:00
wiedld	559f9a612d	refactor(idpe-17789): rename EndlessPartititionStream to EndlessCompactionJobStream	2023-07-24 18:40:15 -07:00
wiedld	2f51333914	refactor(idpe-17789): rename PartitionStream to CompactionJobStream	2023-07-24 18:40:15 -07:00
wiedld	5afc6e6fce	refactor(idpe-17789): in hardcoded components, change all references of partitions_source to compaction_jobs_source	2023-07-24 18:40:15 -07:00
wiedld	067b5038c8	refactor(idpe-17789): rename partitions_source mod to compaction_jobs_source mod	2023-07-24 18:40:15 -07:00
wiedld	b53e435e0d	refactor(idpe-17789): rename ScheduledPartitionsSource to ScheduledCompactionJobsSource	2023-07-24 18:40:15 -07:00
wiedld	0acf02e460	refactor(idpe-17789): rename RandomizeOrderPartitionsSourcesWrapper to RandomizeOrderCompactionJobsSourcesWrapper	2023-07-24 18:40:15 -07:00
wiedld	759a494724	refactor(idpe-17789): rename NotEmptyPartitionsSourceWrapper to NotEmptyCompactionJobsSourceWrapper	2023-07-24 18:40:15 -07:00
wiedld	0234b56533	refactor(idpe-17789): rename MockPartitionsSource to MockCompactionJobsSource	2023-07-24 18:40:15 -07:00
wiedld	4f2f4dec83	refactor(idpe-17789): rename MetricsPartitionsSourceWrapper to MetricsCompactionJobsSourceWrapper	2023-07-24 18:40:15 -07:00
wiedld	37313765ce	refactor(idpe-17789): rename LoggingPartitionsSourceWrapper to LoggingCompactionJobsWrapper	2023-07-24 18:40:15 -07:00
wiedld	56f296e338	refactor(idpe-17789): rename PartitionsSource to CompactionJobsSource	2023-07-24 18:40:15 -07:00
wiedld	bab6f239ea	feat(idpe-17789): move Compactor abstractions PartitionsSource and PartitionStream to use CompactionJob	2023-07-24 18:40:04 -07:00
wiedld	82d55fd6b6	refactor(idpe-17789): move PartitionsSource to be separate traits in scheduler vs compactor	2023-07-24 14:43:52 -07:00
wiedld	02088995b2	feat(idpe 17789): compactor to scheduler communication. `update_job_status()` and `end_job()` (#8216 ) * feat(idpe-17789): scheduler job_status() (#8121) This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward. Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness. * fix(idpe-17789): need to remove partition from uniqueness tracking, so it becomes available again * refactor(idpe-17789): split up the single-use end_job() from the multi-use update_job_status() * feat(idpe-17789): Commit is now a scheduler trait, only used externally in the compactor_test_utils * feat(idpe-17789): Propagate errors pertaining to commit, in both the scheduler and the compactor. * feat(idpe-17789): PartitionDoneSink should have different crate-private traits for scheduler versus comactor. * feat(idpe-17789): PartitionDoneSink should propagate errors * test(idpe-17789): integration tests suite * test(idpe-17789): test documenting what skip request does (as outcome) * refactor(idpe-17789): make the validate of the upgrade commit, versus replacement commit, more explicit. * feat(idpe-17789): switch to using parking_lot Mutex within the scheduler	2023-07-24 12:01:28 -07:00
Joe-Blount	acf9da2336	fix: detect empty list in compactor before assert (#8323 )	2023-07-24 15:02:47 +00:00
Joe-Blount	968a0fc574	Merge branch 'main' into jrb_69_smooth_rate_limiter	2023-07-24 08:52:55 -05:00
dependabot[bot]	cd31492e5b	chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 10:07:18 +00:00
Joe-Blount	7622358518	fix: avoid compacting 1 L0 to 1 L0 file (stuck looping)	2023-07-21 13:55:04 -05:00
Joe-Blount	1a62d2e4e7	chore: improve rate limiter accuracy	2023-07-21 11:22:36 -05:00
Joe-Blount	1bed99567c	chore: add DF metrics to compaction spans (#8270 ) * chore: add DF metrics to compaction spans * chore: update string for test verification * chore: update comment --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-20 15:00:22 +00:00
Marco Neumann	0173c50ba1	fix: use correct error code when querier is shutting down (#8282 ) When a long running query is in process and the querier is shutting down, it might happen that the executor (= thread pool and tokio executor responsible for the CPU-bound DataFusion execution) is shut down while the query is running. From a "systems interaction" PoV I think this is totally fine and I would like to avoid some weird ref-counting. Or in other words: if the system is shutting down, shut it down. However the error was treated as "internal" which is not useful. The client should rather be informed that its server was gone and that it is OK (and desired) to retry. So as per <https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this should signal "unavailable". This change wires the error code in such a way that the gRPC service layer can properly inspect it and then changes the error mapping. Ref https://github.com/influxdata/idpe/issues/17917 . Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-20 12:08:22 +00:00
Joe-Blount	4c5c2473e6	chore: move scratchpad cleanup earlier	2023-07-19 10:16:28 -05:00
Joe-Blount	69e4f998dd	fix: chunk up high plan counts in the compactor	2023-07-18 15:16:04 -05:00
Joe-Blount	85a9e13262	Merge branch 'main' into jrb_63_compactor_spans	2023-07-17 09:52:27 -05:00
kodiakhq[bot]	ebba032399	Merge branch 'main' into cn/all-over-again	2023-07-17 14:46:48 +00:00
Carol (Nichols \|\| Goulding)	cf046d0b3e	refactor: Extract a from implementation for creating TransitionPartitionId	2023-07-17 10:34:01 -04:00
Carol (Nichols \|\| Goulding)	a9b788b58f	feat: Collate chunks based on their partition hash id if they have it	2023-07-17 10:34:01 -04:00
Joe-Blount	bba0685c59	Merge remote-tracking branch 'origin/main' into jrb_64_add_test_case	2023-07-17 09:16:13 -05:00
Joe-Blount	190bc41ca4	chore: comment spelling	2023-07-17 09:14:55 -05:00
dependabot[bot]	4c0e5db3a5	chore(deps): Bump insta from 1.30.0 to 1.31.0 (#8242 ) Bumps [insta](https://github.com/mitsuhiko/insta) from 1.30.0 to 1.31.0. - [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitsuhiko/insta/compare/1.30.0...1.31.0) --- updated-dependencies: - dependency-name: insta dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-17 14:01:21 +00:00
Carol (Nichols \|\| Goulding)	10a0f8e3bf	fix: Remove ::default() when constructing unit structs As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs	2023-07-14 10:50:55 -04:00
Joe-Blount	e1e1d5ab38	fix: compactor loop in highly backlogged case	2023-07-13 16:47:59 -05:00
Joe-Blount	823a878675	Merge remote-tracking branch 'origin/main' into jrb_63_compactor_spans	2023-07-13 10:37:21 -05:00
Joe-Blount	18ef39c5fe	chore: remove env variable control of compactor spans	2023-07-13 09:05:22 -05:00
Joe-Blount	803122e3b4	Merge remote-tracking branch 'origin/main' into jrb_63_compactor_spans # Conflicts: # compactor/src/driver.rs	2023-07-13 08:54:22 -05:00
Joe-Blount	67343210cd	chore: address comments	2023-07-13 08:51:19 -05:00
Dom Dwyer	7f7d1f2ee7	fix(ingester): projection without time column The ingester can project arbitrary columns at query time, and has no special requirement that the "time" column be part of that projection. Because the timestamp summary generation explicitly requires the time column to exist, it panics when there's no "time" column in the projection - this is a bit of a modelling mismatch more than anything.	2023-07-13 14:22:48 +02:00
kodiakhq[bot]	e73116a122	Merge branch 'main' into cn/query-catalog-with-either-partition-identifier	2023-07-12 14:51:02 +00:00
Andrew Lamb	f33891b9fe	fix(all-in-one): Run compactor in all-in-one mode (#8214 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-11 21:14:07 +00:00
wiedld	d43300635e	Revert "feat(idpe-17789): scheduler job_status() (#8202 )" (#8213 ) This reverts commit `3dabccd84b`.	2023-07-11 10:33:56 -07:00
Joe-Blount	23aff4afc4	chore: add more useful info to compactor tracing	2023-07-11 10:42:32 -05:00
wiedld	3dabccd84b	feat(idpe-17789): scheduler job_status() (#8202 ) * feat(idpe-17789): scheduler job_status() (#8121) This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward. Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.	2023-07-11 08:41:12 -07:00
Andrew Lamb	b24f9c81ba	chore: Update DataFusion pin, updates for API changed (#8199 )	2023-07-11 13:36:38 +00:00
Joe-Blount	16939c849d	chore: add tracing to compactor	2023-07-10 16:36:24 -05:00
Carol (Nichols \|\| Goulding)	22c17fb970	feat: Abstract over which partition ID type we're using to list Parquet files	2023-07-10 13:40:01 -04:00
Carol (Nichols \|\| Goulding)	eec31b7f00	feat: Abstract over which partition ID type we're using to get a partition from the catalog	2023-07-10 10:43:20 -04:00
Joe-Blount	ec6a609f63	Merge branch 'main' into jrb_58_throttle_partition_processing	2023-07-07 08:46:22 -05:00
Joe-Blount	9f522bfd30	Revert "feat(idpe-17789): scheduler job_status() (#8121 )" (#8175 ) This reverts commit `5d19fa3635`.	2023-07-06 18:52:25 +00:00
Joe-Blount	28eb3dcd92	feat: throttle partition evaluation in the compactor	2023-07-06 11:54:18 -05:00
wiedld	5d19fa3635	feat(idpe-17789): scheduler job_status() (#8121 ) This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward. Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.	2023-07-06 09:15:59 -07:00
dependabot[bot]	26a6113a37	chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-06 09:58:51 +00:00
dependabot[bot]	b5c9628f0f	chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-05 09:05:13 +00:00
Marco Neumann	ce6a2fb613	refactor: remove `QueryChunk::column_values` (#8111 ) Similar to #8109. This was once implemented by the RUB but as it stands right now, no chunk implements this anymore. If we ever want to bring this back, we should use the output of `QueryChunk::data` instead (i.e. use a data-based implementation instead of a per-chunk one). Closes #8096.	2023-07-03 09:03:21 +00:00
Marco Neumann	b982ee180e	refactor: remove `QueryChunk::column_names` (#8109 ) This interface was once specially implemented by the RUB. The only actual implementation of it is within the querier that just forwards it to a simple schema scan. Lift this semantic to `iox_query_influxrpc` instead so all the chunks can use it. If we ever want to optimize this again, we should use `QueryChunk::data` instead (i.e. instead of implementing it within the chunk it should use the data method and do something smart based on that). First half of #8096.	2023-06-29 13:43:10 +00:00
Marco Neumann	dcb4a9bb5c	refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107 ) Closes #8095.	2023-06-29 11:02:48 +00:00
Marco Neumann	4638b89d93	refactor: migrate retention to proper predicates (#8092 ) Do not (ab)use per-chunk delete predicates for the retention policy. Instead use a per-table predicate. This makes the code way cleaner, since the scoping is correct (i.e. delete predicates are a table-wide attribute, not a chunk-based one) and it is consistent time predicates that the user providers (e.g. via `WHERE time > x`). It also allows us to remove delete predicates (in their current, non-scalable form) from the query path. A potential future version would likely not use per chunk predicates (and "is processed" markers) but use the timestamp / chunk order to determine to which data the predicate should be applied. Note that the lowering of the retention policy changed slightly from ```text (time > (now() - retention)) AND (time < MAX) ``` to ```text time > (now() - retention) ``` Since the `MAX` cut is just an artifact of the lowering and was unnecessary. Closes #7409. Closes #7410.	2023-06-29 08:36:37 +00:00
wiedld	3a8a8a153e	feat(idpe 17789): provide scheduler interface (#8057 ) * feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs. * feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components. * feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs() * feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition * feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig. * chore: make IdOnlyPartitionFilter into only pub(crate) * chore: update scheduler display to include any report information (a.k.a. shard_config, if present)	2023-06-28 15:04:00 -07:00
Joe-Blount	ac9cc24315	fix: compactor shouldn't leave small L1s in non-overlap leading edge pattern (#8101 ) * fix: compactor shouldn't leave tiny L1s with non-overlapped leading edge pattern * chore: insta updates for prior commit	2023-06-28 17:02:21 +00:00
Joe-Blount	40865e011c	fix: compactor loop on L1 files (#8082 ) * chore: suppress insta run output on some long tests * fix: prevent L1 compaction looping * chore: insta updates from prior commit * chore: addresss comments	2023-06-26 21:21:24 +00:00
Joe-Blount	99d0530a21	fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8056 ) * chore: adjust with_max_num_files_per_plan to more common setting This significantly increases write amplification (see change in `written` at the conclusion of the cases) * fix: compactor looping with unproductive compactions * chore: formatting cleanup * chore: fix typo in comment * chore: add test case that compacts too many files at once * fix: enforce max file count for compaction * chore: insta churn from prior commit --------- Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 09:19:06 +00:00
dependabot[bot]	74a48a8f63	chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060 ) * chore(deps): Bump itertools from 0.10.5 to 0.11.0 Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 08:11:56 +00:00
dependabot[bot]	6e7b838b52	chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059 ) Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0. - [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0) --- updated-dependencies: - dependency-name: insta dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 07:45:41 +00:00
wiedld	62251e2323	refactor(idpe-17789): delineate btwn partitions_source within the scheduler versus compactor (#8028 ) This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.	2023-06-22 11:48:08 -07:00
Carol (Nichols \|\| Goulding)	41420cb920	fix: Borrow transition partition ID when possible	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Dom	cb2968f0ef	Merge branch 'main' into dom/compactor-query-rate-limit	2023-06-22 13:08:24 +01:00
Dom Dwyer	cb79429b5f	refactor: wait until next attempt deadline Minor optimisation for cases where load exceeds the limit, but not by much - sleep until the next query is allowed, rather than a full query period.	2023-06-22 14:07:47 +02:00
Joe-Blount	e0ccc2e345	Revert "fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8039 )" (#8041 ) This reverts commit `b219b4b003`.	2023-06-21 23:14:35 +00:00
Joe-Blount	b219b4b003	fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8039 ) * chore: adjust with_max_num_files_per_plan to more common setting This significantly increases write amplification (see change in `written` at the conclusion of the cases) * fix: compactor looping with unproductive compactions * chore: formatting cleanup * chore: fix typo in comment	2023-06-21 20:23:50 +00:00
wiedld	f75736891a	refactor(idpe-17789): move mod id_only_partition_filter (#8027 ) * chore: delineate scheduler logic boundary in code comments * refactor: move id_only_partition_filter mod into local scheduler * chore: add docs for each IdOnlyPartitionFilter implementation	2023-06-21 10:29:38 -07:00
Dom Dwyer	d1cbbd27b1	feat(compactor): config partition query rate limit Allow the partition fetch queries to be (optionally) rate limited via runtime config.	2023-06-21 15:50:12 +02:00
Dom Dwyer	48e73fdf63	feat(compactor): partition fetch rate limiter Implements a (very) simple rate limiter that permits at most N requests per second, smoothed over a full second.	2023-06-21 15:50:11 +02:00
Dom Dwyer	d6c4b51ba8	refactor: introduce catalog query indirection Add indirection between the CatalogPartitionFilesSource (within the retry-loop) and the underlying catalog.	2023-06-21 15:48:58 +02:00
wiedld	e29b453e0d	refactor: move PartitionsSourceConfig into local scheduler (#8026 )	2023-06-20 16:05:59 -07:00
wiedld	34b5fadde0	refactor: move scheduler related configs to compactor_scheduler (#8013 )	2023-06-20 09:55:35 -07:00
wiedld	8b7ef69f6f	refactor: move partitions_source to scheduler (#8010 ) * refactor: make compactor_scheduler crate * refactor: move PartitionsSource into the compactor_scheduler The compactor currently uses PartitionsSource in two ways: * for the preparation of PartitionIds prior to the compactor pipeline. * for the abstraction which utilize the PartitionIds during the IO pipeline. This commit is a refactoring to enable us to delineate between these two utilizations. The former (preparation) utilization will now be done in the compactor_scheduler. Since the compactor is dependent on the compactor_scheduler, it made sense to move the trait to the scheduler.	2023-06-16 10:02:13 -07:00
wiedld	7a1f54ac64	refactor: remove compactor type (#8011 ) * refactor: remove cold compactions * refactor: remove compaction_type	2023-06-16 09:40:13 -07:00
Joe-Blount	a21596f604	chore: add L1/L2 accumulated size test (#8012 ) This adds 4 small test cases intending to test how compaction decisions made affect the final size of L1/L2 files. The assumption is that when a steady stream of small L0 files is arriving, the compactor needs to be rewriting L1s so they grow to a reasonable size instead of getting left small.	2023-06-16 13:28:59 +00:00
Joe-Blount	5d0bb68c5b	chore: add compactor option to disable scratchpad (#7995 )	2023-06-15 14:35:55 +00:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Joe-Blount	59fd5cd7b4	chore: retain written files in shadow mode	2023-06-09 13:22:49 -05:00
Joe-Blount	9171e1521f	fix: clean compaction output from scratchpad	2023-06-08 09:35:33 -05:00
Carol (Nichols \|\| Goulding)	bf699a8b60	fix: Remove partition ID from the metadata serialized into Parquet files (#7947 ) Nothing gets the partition ID out of the metadata. The parts of the code interacting with object storage that need the ID to create the object store path were using the partition ID from the metadata out of convenience, but I changed those places to pass in the partition ID in a separate argument instead. This will make the transition to deterministic partition IDs a bit smoother. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:03:21 +00:00
Andrew Lamb	17c0d837b3	chore: Update DataFusion, arrow, object_store pins (#7942 ) * chore: Update DataFusion, arrow, object_store pins * chore: Update for hakari * chore: Update for new APIs * fix: update test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-07 17:08:31 +00:00
Marco Neumann	fa5011197c	refactor: migrate `iox_query` to use DataFusion statistics (#7908 ) This is the major part of #7470. Additional clean ups (e.g. to remove the actual types from `data_types`) will follow. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-02 09:18:59 +00:00
Andrew Lamb	a48f681e56	feat(parquet): reduce and limit buffering when writing parquet files (#7880 ) * feat: limit buffering when writing parquet files ("combined solution") * chore: Run cargo hakari tasks --------- Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-31 13:27:32 +00:00
Joe-Blount	3b77929007	chore: lint fmt	2023-05-30 16:08:40 -05:00
Joe-Blount	c2423d8a5c	feat: Order L0s more deterministically	2023-05-30 15:52:03 -05:00
Andrew Lamb	1ff76b7bf2	chore: use workspace dependencies for `object_store`	2023-05-26 07:03:42 -04:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	afb3838437	feat: Optionally supply the namespace partition template when creating a namespace	2023-05-24 10:10:34 -04:00

1 2 3 4 5 ...

622 Commits (29462d0fe570990f7eb08c3af545f1ba6d2d3800)