This commit implements a PartitionCache decorator over the PartitionProvider abstraction. When an ingester starts up, the internal data structures are empty and are lazily initialised for each namespace / table / partition as they are observed in the stream of DML ops. This lazy initialisation includes resolving the Partition ID and last persisted sequence number offset value from the catalog for each partition in each table in each namespace for which an op is observed - this occurs in the hot path, while blocking ingest for a shard. resolving each partition will cause a catalog query, this can cause a spike in queries against the catalog, also resulting in unnecessarily slow ingester recovery - we're effectively lazily warming a cache of PartitionData in the hot path! Instead this cache can be used to pre-warm the N most recently created partitions (which are likely to have ongoing writes) at startup to eliminate the hot-path overhead and associated catalog queries. NOTE: unlike most of the other hot-path queries, partition persist offset resolution cannot be eliminated by changes to the Kafka wire format. |
||
---|---|---|
.. | ||
src | ||
Cargo.toml |