influxdb/influxdb3_server
wayne c765d50d39
feat: introduce TableIndexSnapshot, TableIndex, and TableIndexCache (#26636)
This commit brings over `TableIndexCache` support from the enterprise
repo. It primarily focuses on efficient automatic cleanup of expired
gen1 parquet files based on retention policies and hard deletes. It

- Adds purge operations for tables and retention period expired data.
- Integrates `TableIndexCache` into `PersistedFiles` for the sake of
  parquet data deletion handling in `ObjectDeleter` impl.
- Introduces a new background loop for applying data retention polices
  with a 30m default interval.
- Includes comprehensive test coverage for cache operations, concurrent
  access, persisted snapshot to table index snapshot splits, purge
  scenario, object store path parsing, etc.

\## New Types

- `influxdb3_write::table_index::TableIndex`:
  - A new trait that tracks gen1 parquet file metadata on a per-table
    basis.

- `influxdb3_write::table_index::TableIndexSnapshot`:
  - An incremental snapshot of added and removed gen1 parquet files.
  - Created by splitting a `PersistedSnapshot` (ie a whole-database
    snapshot) into individual table snapshots.
  - Uses the existing snapshot sequence number.
  - Removed from object store after successful aggregation into
    `CoreTableIndex`.

- `influxdb3_write::table_index::CoreTableIndex`:
  - Implements of `TableIndex` trait.
  - Aggregation of `TableIndexSnapshot`s.
  - Not versioned -- assumes that we will migrate away from Parquet in
    favor of PachaTree in the medium/long term.

- `influxdb3_write::table_index_cache::TableIndexCache`
  - LRU cache
  - Configurable via CLI parameters:
    - Concurrency of object store operations.
    - Maximum number of `CachedTableIndex` to allow before evicting
      oldest entries.
  - Entrypoint for handling conversion of `PersistedSnapshot` to
    `TableIndexSnapshot` to `TableIndex`

- `influxdb3_write::table_index_cache::CachedTableIndex`
  - Implements `TableIndex` trait
  - Accessing ParquetFile or TableIndex causes last access time to be
    updated.
  - Stores a mutable `CoreTableIndex` as implementation detail.

- `influxdb3_write::retention_period_handler::RetentionPeriodHandler`
  - Runs a top-level background task that periodically applies retention
    periods to gen1 files via the `TableIndexCache`.
  - Configurable via CLI parameters:
    - Retention period handling interval

\## Updated Types

- `influxdb3_write::persisted_files::PersistedFiles`
  - Now holds an `Arc` reference to `TableIndexCache`
  - Uses its `TableIndexCache` to apply hard deletion to all historical
    gen1 files and update associated `CoreTableIndex` in the object
    store.
2025-07-28 13:23:56 -06:00
..
src feat: introduce TableIndexSnapshot, TableIndex, and TableIndexCache (#26636) 2025-07-28 13:23:56 -06:00
tests feat: introduce TableIndexSnapshot, TableIndex, and TableIndexCache (#26636) 2025-07-28 13:23:56 -06:00
Cargo.toml refactor: Use iox_http_util to make updating to hyper 1 easier (#26436) 2025-06-02 14:57:55 -04:00