This adds a command to `influxdb_iox` that can take a WAL segment file
and regenerate all write operation entries, writing to stdout or namespaced
files within a target directory, using table ID as the measurement name
in the case where there is no catalog access at point of regeneration.
WAL read errors encountered by the new WAL WriteOpDecoder were being
discarded and presented as a "happy path" end of file to callers due
to a bug in handling a nested result type. This moves the test for
decoding a tail-corrupt WAL into the crate itself and asserts the
error is reported correctly.
The replication flag defines the total number of copies of each write -
slightly less confusing than the additional copies it was previously,
and matches with the actual code.
If the test setup calls `Step::Persist` to persist on-demand, that
means it shouldn't be used with `ChunkStage::Parquet`, which tries to
persist as fast as possible. This will fail the test with a hopefully
helpful message to prevent this.
* refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres
Restores code from https://github.com/influxdata/influxdb_iox/pull/7708
Revert "revert: PR #7708"
This reverts commit c9cfe05f8d.
* fix: merge
* fix: Update new test
I was going back and forth on this, but the MVP is tags only. If we
expand it to be the more general "columns" in the future, we can change
the proto to reflect the more generalised implementation and have a more
descriptive field name now!
* test: add dedup test for multiple partitions and ranges
* refactor: remove `RedudantSort` optimizer pass
Similar to #7807 this is now covered by DataFusion, as demonstrated by
the fact that all query tests (incl. explain tests) still pass.
The good thing is: passes that are no longer required don't require any
upstreaming, so this also closes#7411.
DataFusion is now smart enough to do that using the builtin passes. No
`EXPLAIN` tests regressed.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: reproducer for idpe_17556
* fix: `ParquetSortness` and partial opt
1. correctly handle cases where `ParquetSortness` would optimize one
child branch but not the other
2. handle cases where `ParquetSortness` recusion should stop a bit
clearer (using `TreeNodeRewriter`)
3. rename query tests to be a bit clearer
4. add test case with many (but not too many) duplicate files and an
ingester (basically a prod use case where the compactor is slightly
behind)
---------
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is somewhat a left-over from the old phys. plan construction where
we tried to fold in the sorts at the right place. Now the optimizer
takes care of that, so we can just express this as a standard logical
node (the same as SQL and InfluxQL). This makes the plan construction a
bit cleaner since the actual scan provider only performs the minimal
work that is required by DataFusion and the users (SQL, InfluxQL, reorg)
request what they actually need.
The tests in `iox_query::frontend::reorg::tests` that assert the tests
still pass and proof that the actual physical plans are identical w/
this approach.
Closes#7785.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Any `VarRef`s in the `Select` where the `data_type` field is `None`
indicates there is no field or tag found in the schema of any children
in the `FROM` clause
* Use `VarRef` for `Tag`, to ensure a consistent representation of a
column reference across `GROUP BY`, `SELECT` projection and `WHERE`
clause.
* Rename `tags` to `tag_names`
* `tags` now returns `VarRef`
* test: add tests for the desired contract for parsing measurements from line protocol
* fix: restrict null chars in measurement
* chore: make an explicit Measurement type
* refactor: have iox lp parser match influxdb contract, for acceptance of eq in measurements
* test: create end_to_end test to confirm same write-then-read behavior with `=` in measurements, is the same as influxdb