influxdb/compactor
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
..
src refactor: simplify `QueryChunk` data access (#6015) 2022-11-02 08:18:33 +00:00
Cargo.toml feat: Add the catalog service to ingester, querier, and compactor 2022-10-28 10:49:26 -04:00
README.md docs: add consensus for the desired final output of the compactor (#5069) 2022-07-07 19:11:16 +00:00

README.md

After a partition of a table has not received any writes for some amount of time, the compactor will ensure it is stored in object store as N parquet files which:

  • have non overlapping time ranges
  • each does not exceed a size specified by config param max_desired_file_size_bytes.