* refactor: remove New from a test scenario setup
* test: add explain for 2 different chunk stages
* test: expplain for several chunks from both parquet and ingester
* refactor: ChunkStageNew to ChunkStage and DeleteTimeNew to DeleteTime
* refactor: PredNew -> Pred
* refactor: ChunkDataNew -> ChunkData
* refactor: new name functions to remove _new
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.
I left a few dependencies that this process flagged:
* generated_types
- `pbjson`,`serde`. Apparently used by the generated code.
* grpc-router-test-gen
- `prost`. Apparently used by the generated code.
* influxdb_iox
- `heappy`. Doesn't appear used, but is behind enough feature
flags that I don't care to reason about and it's already optional.
- `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
indirect dependency.
* iox_gitops_adapter
- `k8s_openapi`. Appears to be setting a feature flag of an indirect
dependency.
* chore: Tool for automating arrow version update
* chore: Update datafusion and arrow/parquet/arrow-flight
* fix: update for changes in Arrow API
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: port a few OG to NG tests and remove many more that already ported
* chore: Apply suggestions from code review
* chore: address review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: document and improve `MockIngesterConnection`
* refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries
* fix: mark "IngsterPartition" chunks as unsorted
* fix: "group by" queries may require sorted comparison
* refactor: re-export a few more types from querier
* fix: ensure that test parquet files are de-duped
* test: chunks in ingester stage
* docs: explain test code
* feat: Add basic Querier <--> Ingester "Service Configuration"
* docs: update comments in test
* refactor: cleanup tests a little
* refactor: make trait more consistent
* docs: improve comments in IngesterPartition
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).
Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
* refactor: make NG query test generation more flexible
* refactor: rename OG-specfic query tests
* docs: explain chunk stage generation in NG query tests
* fix: typo
Namespaces are now created on demand and contain their full schema.
Tombstones/chunks are created on demand during the query.
Closes#4123.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: inline function that is used once
* refactor: generalize multi-chunk creation for NG
* refactor: `TwoMeasurementsManyFieldsTwoChunks` is OG-specific
* refactor: generalize `OneMeasurementTwoChunksDifferentTagSet`
* refactor: port `OneMeasurementFourChunksWithDuplicates` to NG
* refactor: `TwoMeasurementsManyFieldsLifecycle` is OG-specific
* refactor: simplify NG chunk generation
* refactor: port `ThreeDeleteThreeChunks` to NG
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add the generic components to create two-chunk scenarios. Includes small
scenario fixes for things like system tables that are not identical
between OG and NG (also see #4111.)
Ref #3934.
Some query test scenarios are duplicates and are very OG specific. Let's
use generic scenarios (i.e. the ones that contain all chunk stages
instead of a specific one) where applicable.
For #3934.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.
For #3934.
* refactor: dyn-dispatch database in query subsystem
This is similar to #4080 but concerns the database itself.
For #3934.
* docs: improve wording
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- this is what DataFusion is doing as well; it's also fast enough
because the number of chunks in a query is not THAT massive (it's not
like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
`DbChunk` and that we would also need for the querier (e.g. depending
on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
contains massive amounts of generic code
For #3934.
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.
This is a follow-up of #4053.
Ref #3934.
To test the `db::Db` as well as the `querier` with the same test
framework, they require a shared interface. Ideally this interface is
dynamically typed instead of static dispatched via generics because:
- `query_tests` already take ages to compile
- we often hold a list of scenarios and a single scenario should (in a
future PR) be able to represent both OG as well as NG
The vision here is that we basically keep the whole test setup but add
new scenarios which are NG-specific later on.
Now the issue w/ many query-related types is that they are NOT
object-safe because methods that don't take `&self` or they have
associated types that we cannot specify in general for OG and NG at the
same time.
So we need a bunch of wrappers that make dynamic dispatch possible. They
mostly call to an internal "interface" crate which is the actual `dyn`
part. The interface is currently only implemented for OG.
The scenarios currently also only contain OG databases. However,
creating a dynamic interface that can be used in all `query_tests` is
already a huge step.
Note that there are two places where we downcast the dynamic/abstract
database to `db::Db` again:
1. To create one scenario based on another and where we need to
manipulate `db::Db` with OG-specific semantics.
2. `server_benchmarks`. These contain OG databases only and there is no
point in benchmarking throw the dynamic dispatch interface because
prod (`influxdb_ioxd`) also uses static dispatch.
Ref #3934.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
For OG we can determine the chunks w/o any IO, for NG however this might
require a few catalog queries.
This is likely not the last change of this sort, i.e. the whole schema
handling is currently sync as well.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
chunks it gets via `QueryDatabase::chunks` (which includes a table
name and a predicate).
- All other users of that API are NOT really query-related.