- this is what DataFusion is doing as well; it's also fast enough
because the number of chunks in a query is not THAT massive (it's not
like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
`DbChunk` and that we would also need for the querier (e.g. depending
on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
contains massive amounts of generic code
For #3934.
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.
This is a follow-up of #4053.
Ref #3934.
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
* feat: initial implementation of compact a given list of overlapped parquet files
* feat: Add QueryableParquetChunk and some refactoring
* feat: build queryable parquet chunks for parquet files with tombstones
* feat: second half the implementation for Compactor's compact. Tests will be next
* fix: comments for trait funnctions fof QueryChunkMeta
* test: add tests for compactor's compact function
* fix: typos
* refactor: address Jake's review comments
* refactor: address Andrew's comments and add one more test for files in different order in the vector
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: main function for finding and compacting parquet files
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: rename file and struct
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>