Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`. |
||
|---|---|---|
| .. | ||
| src | ||
| Cargo.toml | ||
| README.md | ||
README.md
After a partition of a table has not received any writes for some amount of time, the compactor will ensure it is stored in object store as N parquet files which:
- have non overlapping time ranges
- each does not exceed a size specified by config param max_desired_file_size_bytes.