Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).
Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
* refactor: make NG query test generation more flexible
* refactor: rename OG-specfic query tests
* docs: explain chunk stage generation in NG query tests
* fix: typo
Namespaces are now created on demand and contain their full schema.
Tombstones/chunks are created on demand during the query.
Closes#4123.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: inline function that is used once
* refactor: generalize multi-chunk creation for NG
* refactor: `TwoMeasurementsManyFieldsTwoChunks` is OG-specific
* refactor: generalize `OneMeasurementTwoChunksDifferentTagSet`
* refactor: port `OneMeasurementFourChunksWithDuplicates` to NG
* refactor: `TwoMeasurementsManyFieldsLifecycle` is OG-specific
* refactor: simplify NG chunk generation
* refactor: port `ThreeDeleteThreeChunks` to NG
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add the generic components to create two-chunk scenarios. Includes small
scenario fixes for things like system tables that are not identical
between OG and NG (also see #4111.)
Ref #3934.
Some query test scenarios are duplicates and are very OG specific. Let's
use generic scenarios (i.e. the ones that contain all chunk stages
instead of a specific one) where applicable.
For #3934.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.
For #3934.
* refactor: dyn-dispatch database in query subsystem
This is similar to #4080 but concerns the database itself.
For #3934.
* docs: improve wording
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- this is what DataFusion is doing as well; it's also fast enough
because the number of chunks in a query is not THAT massive (it's not
like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
`DbChunk` and that we would also need for the querier (e.g. depending
on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
contains massive amounts of generic code
For #3934.
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.
This is a follow-up of #4053.
Ref #3934.
To test the `db::Db` as well as the `querier` with the same test
framework, they require a shared interface. Ideally this interface is
dynamically typed instead of static dispatched via generics because:
- `query_tests` already take ages to compile
- we often hold a list of scenarios and a single scenario should (in a
future PR) be able to represent both OG as well as NG
The vision here is that we basically keep the whole test setup but add
new scenarios which are NG-specific later on.
Now the issue w/ many query-related types is that they are NOT
object-safe because methods that don't take `&self` or they have
associated types that we cannot specify in general for OG and NG at the
same time.
So we need a bunch of wrappers that make dynamic dispatch possible. They
mostly call to an internal "interface" crate which is the actual `dyn`
part. The interface is currently only implemented for OG.
The scenarios currently also only contain OG databases. However,
creating a dynamic interface that can be used in all `query_tests` is
already a huge step.
Note that there are two places where we downcast the dynamic/abstract
database to `db::Db` again:
1. To create one scenario based on another and where we need to
manipulate `db::Db` with OG-specific semantics.
2. `server_benchmarks`. These contain OG databases only and there is no
point in benchmarking throw the dynamic dispatch interface because
prod (`influxdb_ioxd`) also uses static dispatch.
Ref #3934.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
For OG we can determine the chunks w/o any IO, for NG however this might
require a few catalog queries.
This is likely not the last change of this sort, i.e. the whole schema
handling is currently sync as well.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
chunks it gets via `QueryDatabase::chunks` (which includes a table
name and a predicate).
- All other users of that API are NOT really query-related.
* refactor: wire exectution context to Deduplicator
* feat: example trace to chunk read_filter
* refactor: make execution context required
* refactor: expose metadata API
* refactor: more span context for chunk read_filter
* refactor: fix build
* refactor: push context into result stream
* refactor: make executor optional
* chore: update datafusion
* fix: Update to use new datafusion api
* chore: update expected plans
* fix: support zero output partitions
* fix: update test
* fix: Update for new DataFusion API
* fix: newly added system table
* fix: update cargo lock