influxdb/compactor at e49f2ca5c7e5a427190f15cda99c003480d0f113 - influxdb - Gitea: ArmstrongLabs

History

Marco Neumann 45b3984aa3 refactor: simplify `QueryChunk` data access (#6015 ) * refactor: simplify `QueryChunk` data access We have only two types for chunks (now that the RUB is gone): 1. In-memory RecordBatches 2. Parquet files Loads of logic is duplicated in the different `read_filter` implementations. Also `read_filter` hides a solid amount of logic from DataFusion, which will prevent certain (future) optimizations. To enable #5897 and to simplify the interface, let the chunks return the data (batches or metadata for parquet files) directly and let `iox_query` perform the actual heavy-lifting. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>		2022-11-02 08:18:33 +00:00
..
src	refactor: simplify `QueryChunk` data access (#6015 )	2022-11-02 08:18:33 +00:00
Cargo.toml	feat: Add the catalog service to ingester, querier, and compactor	2022-10-28 10:49:26 -04:00
README.md	docs: add consensus for the desired final output of the compactor (#5069 )	2022-07-07 19:11:16 +00:00

README.md

After a partition of a table has not received any writes for some amount of time, the compactor will ensure it is stored in object store as N parquet files which:

have non overlapping time ranges
each does not exceed a size specified by config param max_desired_file_size_bytes.