Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).
Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
* refactor: make NG query test generation more flexible
* refactor: rename OG-specfic query tests
* docs: explain chunk stage generation in NG query tests
* fix: typo
* refactor: inline function that is used once
* refactor: generalize multi-chunk creation for NG
* refactor: `TwoMeasurementsManyFieldsTwoChunks` is OG-specific
* refactor: generalize `OneMeasurementTwoChunksDifferentTagSet`
* refactor: port `OneMeasurementFourChunksWithDuplicates` to NG
* refactor: `TwoMeasurementsManyFieldsLifecycle` is OG-specific
* refactor: simplify NG chunk generation
* refactor: port `ThreeDeleteThreeChunks` to NG
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add the generic components to create two-chunk scenarios. Includes small
scenario fixes for things like system tables that are not identical
between OG and NG (also see #4111.)
Ref #3934.
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.
For #3934.
* chore: update datafusion
* fix: Update to use new datafusion api
* chore: update expected plans
* fix: support zero output partitions
* fix: update test
* fix: Update for new DataFusion API
* fix: newly added system table
* fix: update cargo lock
* chore: Update DataFusion pin
* fix: Update for new DF API
* fix: update plan output
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Do not rebuild query_tests if .sql or .expected change
* feat: Add CI check
* refactor: move some sql tests to .sql files
* tests: port tests / expected results to data files
* fix: restore old name check-flatbuffers
The query processing was implicitly relying on the order provided by the
catalog. This had two issues:
- this ordering was not defined in the API contract (neither via docs
nor via typing)
- the order was based on chunk IDs which is not adequate in some cases
(e.g. when chunks are created while a persistence operations is in
progress)
Now we explicitly sort chunks by `(order, ID)`.
Fixes#1963.
* chore: Update datafusion deps to pre-release
* refactor: Update IOx to use new datafusion Statistics
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: do not use DataFrame DataFusion API
* fix: update output to reflect not running optimizer twice
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do no longer scan chunks during query planning to determine the schema
(except for the lifetime jobs where we have a good reason to do so).
Instead pass the schema down to from whoever is triggering the query.
For real SQL queries, we then just use the the table-wide schemas
introduced in #1913.
Apart from avoiding schema merges we now also don't crash any longer
when no chunks are left in the table (aka columns are present but all
rows are gone).
Fixes#1768.
Fixes#1884.
Beforehand:
```text
❯ env CARGO_LOG=cargo::core::compiler::fingerprint=info cargo test -p query_tests
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] stale: changed "/home/mneumann/src/influxdb_iox/query_tests/cases"
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] (vs) "/home/mneumann/src/influxdb_iox/target/debug/build/query_tests-0e8f741dfb84437f/output"
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] FileTime { seconds: 1625474716, nanos: 436081357 } != FileTime { seconds: 1625474752, nanos: 52625167 }
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Test/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/RunCustomBuild/TargetInner { ..: custom_build_target("build-script-build", "/home/mneumann/src/influxdb_iox/query_tests/build.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Build/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO cargo::core::compiler::fingerprint] err: current filesystem status shows we're outdated
Compiling query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)
```
The issue is that both the input and the test output files are located
under `cases/`. `build.rs` used `cargo:rerun-if-changed=cases` which per
Cargo doc will scan ALL files in that directory. Note that the normal
`exclude` directive in `Cargo.toml` does NOT work, see
https://github.com/rust-lang/cargo/issues/4587 .
So we need to split input and output files into separate directories
(`cases/{in,out}`).
* feat: Implment data driven query testing and port explain tests
* fix: do not fmt the auto generated cases
* refactor: split setup and parser into separate modules
* refactor: Add log to runner, add end to end tests
* docs: fixu cpmments