influxdb/compactor at 11877b7ef5ebb4d674c8e4190ed47bf4fbe06150 - influxdb - Gitea: ArmstrongLabs

History

Marco Neumann 064f0e9b29 refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.		2022-09-05 09:25:04 +00:00
..
src	refactor: use DataFusion to read parquet files (#5531 )	2022-09-05 09:25:04 +00:00
Cargo.toml	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 )	2022-08-31 13:30:47 +00:00
README.md	docs: add consensus for the desired final output of the compactor (#5069 )	2022-07-07 19:11:16 +00:00

README.md

After a partition of a table has not received any writes for some amount of time, the compactor will ensure it is stored in object store as N parquet files which:

have non overlapping time ranges
each does not exceed a size specified by config param max_desired_file_size_bytes.