influxdb

Commit Graph

Author	SHA1	Message	Date
Marko Mikulicic	0bc7d90ee3	chore: Avoid defining transition shard numbers in multiple crates	2023-01-27 18:30:34 +01:00
Marko Mikulicic	f6e7724d19	fix(compactor2): Update other locations of the TRANSITION_SHARD_INDEX (#6736 )	2023-01-27 16:59:24 +00:00
Marco Neumann	4391e30d2d	feat: improve compactor2 debugging (#6718 ) * feat: add planning logging wrapper * refactor: split partitionS source and partition source into two components	2023-01-26 16:10:20 +00:00
Marco Neumann	68380a32e5	fix: "timeout" as a reason to skip a partition (#6716 ) I've meant to skip partitions w/ timeouts when I designed the functionality but forgot to adjust the error filter accordingly. To not run into this problem again (i.e. forget adjust the filter), make the code a bit more explicit. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-26 15:00:13 +00:00
Marco Neumann	30d411dc95	feat: shadow mode (#6712 ) * refactor: remove untyped durations from `compactor2` * feat: shadow mode Closes #6645. * refactor: split input and output store	2023-01-26 14:20:55 +00:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Marco Neumann	ed694d3be4	feat: introduce scratchpad store for compactor (#6706 ) * feat: introduce scratchpad store for compactor Use an intermediate in-memory store (can be a disk later if we want) to stage all inputs and outputs of the compaction. The reasons are: - fewer IO ops: DataFusion's streaming IO requires slightly more IO requests (at least 2 per file) due to the way it is optimized to read as little as possible. It first reads the metadata and then decides which content to fetch. In the compaction case this is (esp. w/o delete predicates) EVERYTHING. So in contrast to the querier, there is no advantage of this approach. In contrary this easily adds 100ms latency to every single input file. - less traffic: For divide&conquer partitions (i.e. when we need to run multiple compaction steps to deal with them) it is kinda pointless to upload an intermediate result just to download it again. The scratchpad avoids that. - higher throughput: We want to limit the number of concurrent DataFusion jobs because we don't wanna blow up the whole process by having too much in-flight arrow data at the same time. However while we perform the actual computation, we were waiting for object store IO. This was limiting our throughput substantially. - shadow mode: De-coupling the stores in this way makes it easier to implement #6645. Note that we assume here that the input parquet files are WAY SMALLER than the uncompressed Arrow data during compaction itself. Closes #6650. * fix: panic on shutdown * refactor: remove shadow scratchpad (for now) * refactor: make scratchpad safe to use	2023-01-26 10:03:08 +00:00
Marco Neumann	7306ea9424	feat: divide&conquer framework (#6697 ) Allows compactor2 to run a fixed-point loop (until all work is done) and in every loop in can run mulitiple jobs. The jobs are currently organized by "branches". This is because our upcoming OOM handling may split a branch further if it doesn't complete. Also note that the current config resembles the state prior to this PR. So the FP-loop will only iterate ONCE and then runs out of L0 files. A more advanced setup can be built using the framework though.	2023-01-25 14:45:20 +00:00
Marco Neumann	40e6a1a437	feat: job semaphore (#6696 ) * refactor: avoid too-many-arguments * refactor: extract `fetch_partition_info` * feat: job semaphore	2023-01-25 10:35:07 +00:00
Marco Neumann	4521516147	feat: add per-partition timeout (#6686 ) It seems that prod was hanging last night. This is pretty hard to debug and in general we should protect the compactor against hanging / malformed partitions that take forever. This is similar to the fact that the querier also has a timeout for every query. Let's see if this shows anything in prod (and if not it's still a desired safety net). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-24 16:53:47 +00:00
Marco Neumann	1c87d9667f	refactor: record partition completion (both Ok and Err) (#6680 ) With the upcoming divide-and-conquer approach, we have have multiple commits per partition since we can divide it into multiple compaction jobs. For metrics (and logs) however it is important to track the overall process, so we shall also monitor the number of completed partitions.	2023-01-24 15:06:15 +00:00
Marco Neumann	32df24e057	feat: compactor2 error classification (#6676 ) * feat: add error kinds * refactor: sink proper error type * fix: ignore object store errors See <https://github.com/influxdata/idpe/issues/16984>. * feat: log error kind * feat: per-kind error metric Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-24 13:50:19 +00:00
Marco Neumann	bcb1232c5d	refactor: integrate "skipped" handling into the partition filter framework (#6673 ) * refactor: pass partition ID to partition filter * feat: add logging partition filter wrapper * refactor: make partition filter async * refactor: integrate "skipped" handling into the partition filter framework Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-24 11:34:06 +00:00
Nga Tran	06d4a5fe4e	refactor: ignore partitions in table skipped compactions (#6666 ) * refactor: ignore partitions in table skipped compactions * refactor: continue ignoring partitions in skipped compaction * test: skip partition	2023-01-23 19:53:05 +00:00
Marco Neumann	e2cfe809d2	refactor: planner as a component (#6665 ) * refactor: planner as a component Now everything except for the core algorithm structure is a component. This also means that the driver no longer needs the whole config structure. * docs: explain V1	2023-01-23 16:02:01 +00:00
Marco Neumann	c9821720ab	test: ensure Arrow/DataFusion panics don't crash compactor (#6664 ) Closes #6644.	2023-01-23 15:30:16 +00:00
Marco Neumann	cb02262b9d	refactor: extract "exec DF plan" and "store stream to file" components (#6663 ) * refactor: extract `PartitionInfo` * refactor: extract DF exec component * feat: add some error conversions * refactor: make fn public * refactor: extract file sink component * fix: clippy	2023-01-23 14:40:35 +00:00
dependabot[bot]	0114e7ee50	chore(deps): Bump async-trait from 0.1.61 to 0.1.63 (#6660 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.61 to 0.1.63. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.61...0.1.63) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-23 08:41:27 +00:00
Nga Tran	411b3db928	fix: Get shard id from a constant (topic, shard_index) to avoid error of shard_id FK violation (#6658 ) * fix: ake shard_id FK always 1 * fix: use const shard_index to read its ID * refactor: read shard_id during compactor initiation	2023-01-22 16:49:06 +00:00
Nga Tran	840923abab	refactor: execute compaction plan (#6654 ) * chore: address review comment of previous PR * refactor: execute compact plan * refactor: we will now compact all L0 and L1 files of a partition and split them as needed * chore: comnents Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-20 22:34:50 +00:00
Marco Neumann	4f1beba482	feat: filter out L2 files from compaction (#6653 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-20 15:44:13 +00:00
Marco Neumann	111e582d71	feat: improve compactor2 metrics and logging (#6652 ) Closes #6647. We can always create tickets for concrete issues/wishes or create on-demand PRs.	2023-01-20 15:08:00 +00:00
Nga Tran	8aeded32d6	refactor: reorganize compact_files core function (#6636 ) * refactor: reorganize compact_files core function * chore: smore more in-progress structure * refactor: further reorganization for compact_files Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-20 13:41:23 +00:00
Marco Neumann	810f9096b8	feat: catalog commits for compactor2 (#6638 )	2023-01-19 17:37:45 +00:00
Marco Neumann	df85c7b154	refactor: more flexible filter design (#6635 ) Filters can now inspect ALL files for a partition which may be useful for limiters. This also moves the "is not empty" part into a filter. Note that we still can only run ONE compaction job per partition for the time being, so splitting the files into multiple sub-groups and run a per-group DataFusion job is currently not possible. It should be a rather easy addition if we ever want that (probably needs another semaphore of something to limit the overall job count).	2023-01-19 16:40:54 +00:00
Marco Neumann	eb149eca7c	feat: partition error sink (#6633 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 15:34:57 +00:00
Marco Neumann	ad69a554d6	refactor: extract "fetch files" component (#6632 )	2023-01-19 13:11:44 +00:00
Marco Neumann	e4bf9e85d9	refactor: more components for compactor2 (#6630 ) * refactor: rename "rules" to "components" * refactor: replace `Trait::name` w/ `impl Display` * refactor: remodel partitions source * refactor: add TODO * refactor: fully components-based partitions source * test: add tests for partitions sources * docs: improve Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 12:29:38 +00:00
Nga Tran	9ae03b16d6	feat: invokes catalog functions for compactor2 (#6619 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 10:33:57 +00:00
Marco Neumann	380a855aab	feat: basic compactor2 algo layout (#6616 )	2023-01-18 18:51:59 +00:00
Marco Neumann	e72173d58d	feat: very basic compactor2 skeleton (#6614 ) Sets up crate and wires up the main binary. No tests yet, no algorithm framework, just the bare minimum. Also I decided to not offer a gRPC server in `compactor2` at the moment and hence did not implement any handle/delegate infrastructure. We add this later if we need it. This also means compactor2 does NOT provide a catalog service for now. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-18 16:36:40 +00:00

31 Commits (11233e3b3b0e9c96ec88920844a4a80cbccfe2b4)