influxdb

Commit Graph

Author	SHA1	Message	Date
wayne	c765d50d39	feat: introduce TableIndexSnapshot, TableIndex, and TableIndexCache (#26636 ) This commit brings over `TableIndexCache` support from the enterprise repo. It primarily focuses on efficient automatic cleanup of expired gen1 parquet files based on retention policies and hard deletes. It - Adds purge operations for tables and retention period expired data. - Integrates `TableIndexCache` into `PersistedFiles` for the sake of parquet data deletion handling in `ObjectDeleter` impl. - Introduces a new background loop for applying data retention polices with a 30m default interval. - Includes comprehensive test coverage for cache operations, concurrent access, persisted snapshot to table index snapshot splits, purge scenario, object store path parsing, etc. \## New Types - `influxdb3_write::table_index::TableIndex`: - A new trait that tracks gen1 parquet file metadata on a per-table basis. - `influxdb3_write::table_index::TableIndexSnapshot`: - An incremental snapshot of added and removed gen1 parquet files. - Created by splitting a `PersistedSnapshot` (ie a whole-database snapshot) into individual table snapshots. - Uses the existing snapshot sequence number. - Removed from object store after successful aggregation into `CoreTableIndex`. - `influxdb3_write::table_index::CoreTableIndex`: - Implements of `TableIndex` trait. - Aggregation of `TableIndexSnapshot`s. - Not versioned -- assumes that we will migrate away from Parquet in favor of PachaTree in the medium/long term. - `influxdb3_write::table_index_cache::TableIndexCache` - LRU cache - Configurable via CLI parameters: - Concurrency of object store operations. - Maximum number of `CachedTableIndex` to allow before evicting oldest entries. - Entrypoint for handling conversion of `PersistedSnapshot` to `TableIndexSnapshot` to `TableIndex` - `influxdb3_write::table_index_cache::CachedTableIndex` - Implements `TableIndex` trait - Accessing ParquetFile or TableIndex causes last access time to be updated. - Stores a mutable `CoreTableIndex` as implementation detail. - `influxdb3_write::retention_period_handler::RetentionPeriodHandler` - Runs a top-level background task that periodically applies retention periods to gen1 files via the `TableIndexCache`. - Configurable via CLI parameters: - Retention period handling interval \## Updated Types - `influxdb3_write::persisted_files::PersistedFiles` - Now holds an `Arc` reference to `TableIndexCache` - Uses its `TableIndexCache` to apply hard deletion to all historical gen1 files and update associated `CoreTableIndex` in the object store.	2025-07-28 13:23:56 -06:00
Stuart Carnie	01c907de0e	fix: handle corrupt WAL files during replay without panic (#26556 ) Add bounds checking to prevent panic when WAL files are empty or truncated. Introduces `--wal-replay-fail-on-error` flag to control behavior when encountering corrupt WAL files during replay. - Add WalFileTooSmall error for files missing required header bytes - Validate minimum file size (12 bytes) before attempting deserialization - Make WAL replay configurable: skip corrupt files by default or fail on error - Add comprehensive tests for empty, truncated, and header-only files Closes #26549	2025-06-24 12:57:40 +01:00
wayne	8e0688912f	feat: allow users to specify lookback duration for PersistedFiles buffer (#26528 )	2025-06-18 09:41:05 -06:00
praveen-influx	a67b50dac5	feat: add concurrency limit for WAL replay (#26483 ) WAL replay currently loads _all_ WAL files concurrently running into OOM. This commit adds a CLI parameter `--wal-replay-concurrency-limit` that would allow the user to set a lower limit and run WAL replay again. closes: https://github.com/influxdata/influxdb/issues/26481	2025-06-03 16:34:31 +01:00
Trevor Hilton	be25c6f52b	test: deduplication across memory and parquet chunks (#26477 )	2025-05-29 16:27:32 -04:00
Trevor Hilton	5bf3a1aef8	test: add integration tests to influxdb3_server (#26474 )	2025-05-28 21:39:40 -04:00

6 Commits (tjh/3-3-0-install-script)