docs: high-level documentation for `compactor2` (#6818)
* docs: high-level documentation for `comapctor2` * docs: typos & clarification Co-authored-by: Nga Tran <nga-tran@live.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve wording * fix: typo --------- Co-authored-by: Nga Tran <nga-tran@live.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com>pull/24376/head
parent
6f4e287a3a
commit
cd6d69c9b1
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 210 KiB |
|
@ -1,4 +1,162 @@
|
|||
//! The compactor.
|
||||
//!
|
||||
//!
|
||||
//! # Mission statement
|
||||
//!
|
||||
//! This processes parquet files produced by the ingesters so that queriers can read them more efficiently. More precisely it
|
||||
//! tries to meet the following -- partially conflicting -- objectives:
|
||||
//!
|
||||
//! - **O1 -- duplicates:** Duplicate across files should be removed[^duplicates_within_files]. Removing duplicates
|
||||
//! during query is costly.
|
||||
//! - **O2 -- overlaps:** The key-space (tags + timestamp) within parquet files should be strictly distinct. Without this
|
||||
//! attribute the querier has now way to detect the inter-file key uniqueness and hence must run costly
|
||||
//! de-duplications during query time.
|
||||
//! - **O3 -- minimize number of files:** There should be as few files as possible. Every file comes with
|
||||
//! overhead both within the catalog and on the querier side.
|
||||
//! - **O4 -- max file size:** Parquet files must have a maximum (configurable) size (with a certain margin on top due
|
||||
//! for technical reasons[^max_parquet_size]). Files that are too large will cause memory issues with querier
|
||||
//! - **O5 -- minimal changes:** The changes to the set of parquet files should be as minimal as possible to increase cache
|
||||
//! efficiency on the querier side, reduce write amplification (rewriting the same row over and over again) and
|
||||
//! decrease the catalog load (each file creation / deletion requires catalog updates).
|
||||
//! - **O6 -- catch-up:** The compactor should be able to not "fall behind" the ingest tier.
|
||||
//! - **O7 -- costs:** The compactor shall be cost-efficient.
|
||||
//!
|
||||
//! From these objectives, we can derive the following additional requirements -- some more and some less obvious ones:
|
||||
//!
|
||||
//! - **R1 -- avoid re-compaction:** Avoid re-processing the same data multiple times. This can be derived from *O5*,
|
||||
//! *O7*, and potentially *O6*.
|
||||
//! - **R2 -- parallelization:** Be able to run multiple compaction jobs at the same time. Derived from *O6* and *O7*.
|
||||
//! - **R3 -- scale-out:** Be able to scale to multiple nodes. Derived from *O6* but may very much depend on *R2*.
|
||||
//! - **R4 -- concurrent compute & IO:** Be able to decouple IO (and the involved latencies) from the actual
|
||||
//! computation. Derived from *O6* and *O7*.
|
||||
//!
|
||||
//!
|
||||
//! # Crate Layout
|
||||
//!
|
||||
//! This crate tries to decouple "when to do what" from "how to do what". The "when" is described by the [driver] which
|
||||
//! provides an abstract framework. The "how" part is described by [components] which act as puzzle pieces that are
|
||||
//! plugged into the [driver].
|
||||
//!
|
||||
//! The concrete selection of components and their setup is currently [hardcoded](crate::components::hardcoded). This
|
||||
//! setup process take a [config] and creates a [component setup](crate::components::Components).
|
||||
//!
|
||||
//! The final flow graph looks like this:
|
||||
//!
|
||||
//! ```text
|
||||
//! (CLI/Env) --> (Config) --[hardcoded]--> (Components) --> (Driver)
|
||||
//! ```
|
||||
//!
|
||||
//! ## Driver
|
||||
//! The driver is the heart of the compactor and describes and executes the framework.
|
||||
//!
|
||||
//! This is the data and control flow of driver:
|
||||
#![doc = include_str!("../img/driver.svg")]
|
||||
//!
|
||||
//! In general modification of the driver should be rare. Most changes should be implement by using components.
|
||||
//!
|
||||
//! ## Components
|
||||
//! Components can be thought as puzzle pieces used by the driver. Technically there are two parts to it: the driver
|
||||
//! interfaces (traits) and the implementations.
|
||||
//!
|
||||
//! **Note: The code samples used in the "components" section are simplified and may not reflect actual components. They are
|
||||
//! mostly illustrations.**
|
||||
//!
|
||||
//! ### Interfaces
|
||||
//! Component interfaces should describe one single aspect of the framework. For example:
|
||||
//!
|
||||
//! ```
|
||||
//! # use std::fmt::{Debug, Display};
|
||||
//! # use data_types::ParquetFile;
|
||||
//! /// Calculate importance of a file.
|
||||
//! pub trait FileImportants: Debug + Display + Send + Sync {
|
||||
//! /// Rates file by importance.
|
||||
//! ///
|
||||
//! /// Higher ratings mean "more important".
|
||||
//! fn rate(&self, file: &ParquetFile) -> u64;
|
||||
//! }
|
||||
//! ```
|
||||
//!
|
||||
//! Looking this interface you will note the following aspects:
|
||||
//!
|
||||
//! - **`Debug`:** Allows to use the component in our structs that implement [`Debug`].
|
||||
//! - **`Display`:** Allows to produce easy human (and machine) readable dumps of the current component setup. Also
|
||||
//! allows other components to implement [`Display`] more easily.
|
||||
//! - **immutable:** All methods take `&self`. Interior mutability (e.g. for caching) is allowed. This allows easy parallelization.
|
||||
//! - **not-consuming:** No method takes `self`. This ensures that the trait is object-safe and can be put into an [`Arc`].
|
||||
//! - **`Send + Sync`:** Together with the the two previous points this allows the usage within an [`Arc`].
|
||||
//! - **single method:** Most interfaces will only have a single method. This is because they have one dedicated job.
|
||||
//!
|
||||
//! The above interface will not allow any IO because it is not async. It is important to keep the interfaces that
|
||||
//! perform IO and the ones that don't clearly apart, so it is visible to the user and allows a proper driver design. An
|
||||
//! interface that performs IO looks like this:
|
||||
//!
|
||||
//! ```
|
||||
//! # use std::fmt::{Debug, Display};
|
||||
//! # use async_trait::async_trait;
|
||||
//! # use data_types::{Partition, PartitionId};
|
||||
//! /// Fetches partition info.
|
||||
//! #[async_trait]
|
||||
//! pub trait PartitionSource: Debug + Display + Send + Sync {
|
||||
//! /// Fetches partition info.
|
||||
//! ///
|
||||
//! /// This method retries internally.
|
||||
//! async fn fetch(&self, id: PartitionId) -> Partition;
|
||||
//! }
|
||||
//! ```
|
||||
//!
|
||||
//! For IO interfaces, all the points of the normal interface apply. In addition, the following points are important:
|
||||
//!
|
||||
//! - **error & retries:** Most IO interfaces are NOT allowed to return an error. Instead they shall implement internal
|
||||
//! retries (e.g. using [`backoff`]).
|
||||
//!
|
||||
//! ### Implementations
|
||||
//! The implementations of the aforementioned interfaces roughly fall into the following categories:
|
||||
//!
|
||||
//! - **true sources/sinks:** These are the actual (mostly object-store and catalog backed) implementations used in
|
||||
//! production. There may be multiple variants that are used depending on the config. NO metrics or logging is
|
||||
//! implemented directly within these (see "wrappers" below).
|
||||
//! - **assemblies:** Assembles components of other types into this type. E.g. a filter for a set of files
|
||||
//! (`[file] -> [file]`) can be assembled by a filter for a single file (`file -> bool`).
|
||||
//! - **mocks:** Mocks can be controlled during initialization or later and may record the interfaction with them. They
|
||||
//! are mostly used for testing or development purposes.
|
||||
//! - **wrappers:** Wrap a component of the same type but add functionality on top. This can be logging, metrics,
|
||||
//! assertions, data dumping, filtering, randomness, etc.
|
||||
//!
|
||||
//! This modular approach allows easy testing of components as well consistency. E.g. swapping out a data source for
|
||||
//! another one will preserve the same logs and metrics (given that the same wrappers are used).
|
||||
//!
|
||||
//! ## Configuration
|
||||
//! The [config] allows the compactor to set up a wide range of component setups. The config options fall roughly into
|
||||
//! the following categories:
|
||||
//!
|
||||
//! - **parameters:** Information required to run the compactor, e.g. the object store connection or sharding information.
|
||||
//! - **tuning knobs:** Options like "maximum desired file size", or "number of concurrent jobs" allow deployments to
|
||||
//! fine-tune the compactor behavior. They do NOT fundamentally change the behavior though.
|
||||
//! - **behavior switches:** Switches like "ignore that partitions where flagged", "run once and exit", or "use this
|
||||
//! pre-defined set of partitions" change the behavior of the system. This may be used for emergency situations or to
|
||||
//! debug / develop the compactor.
|
||||
//! - **algorithm versions:** This sets [`AlgoVersion`] and fundamentally changes the behavior, e.g. from a simple demo
|
||||
//! approach to some experimental new system. Note that this only changes the setup of the components, it does NOT
|
||||
//! change the driver.
|
||||
//!
|
||||
//!
|
||||
//! [^duplicates_within_files]: The compactor design would not fundamentally change if the ingester would produce output
|
||||
//! that contains duplicates within in single file. The current implementation however assumes that this is NOT the case.
|
||||
//!
|
||||
//! [^max_parquet_size]: It is technically hard to determine the size of a parquet size before writing it. It may be
|
||||
//! estimated by assuming that "sum size of input files >= sum size of output files". However there are certain
|
||||
//! cases where this does NOT hold due to the way the data within the output parquet file is split into data pages,
|
||||
//! is then potentially dictionary- and RLE-encoded and [ZSTD] compressed.
|
||||
//!
|
||||
//!
|
||||
//! [`AlgoVersion`]: crate::config::AlgoVersion
|
||||
//! [`Arc`]: std::sync::Arc
|
||||
//! [components]: crate::components
|
||||
//! [config]: crate::config
|
||||
//! [`Debug`]: std::fmt::Debug
|
||||
//! [`Display`]: std::fmt::Display
|
||||
//! [driver]: crate::driver
|
||||
//! [ZSTD]: https://github.com/facebook/zstd
|
||||
#![deny(rustdoc::broken_intra_doc_links, rust_2018_idioms)]
|
||||
#![warn(
|
||||
missing_copy_implementations,
|
||||
|
@ -10,6 +168,7 @@
|
|||
clippy::todo,
|
||||
clippy::dbg_macro
|
||||
)]
|
||||
#![allow(rustdoc::private_intra_doc_links)]
|
||||
|
||||
pub mod compactor;
|
||||
mod components;
|
||||
|
|
Loading…
Reference in New Issue