* chore: Normalise name of Call expression to lowercase
Simplifies matching functions in planner, as they are guaranteed to be
lowercase.
This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.
* chore: Ensure aggregate functions fail gracefully.
* feat: GROUP BY tag support
* feat: Ensure schema-level metadata is propagated
Requires: https://github.com/apache/arrow-rs/issues/3779
* chore: Add some tests to validate GROUP BY output
* chore: Add clarifying comment
* chore: Declare message in flight.proto
The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.
Added tests to validate the metadata is encoded correctly.
* chore: Placate linters
* chore: Use correct column in test cases
* chore: Add `is_projected` to the TagKeyColumn message
`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.
* refactor: Move constants to `schema` crate per PR feedback
* chore: rustfmt 🙄
* chore: Update docs for InfluxQlMetadata
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat: split start-level files that overlap wiht many files
* test: split files and theit split times
* test: split test for L1 and L2 files
* feat: full implementation that support large-size overlapped files
* chore: modify comments to reflect the changes
* fix: typo
* chore: update test output
* docs: clearer comments
* chore: remove empty test files. Will add in in a separate PR
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: address review comments
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: add a knob to turn large-size overlaps on and off
* fix: typo
* chore: update test output after merging main
* fix: split_times should not include the max_time of the file
* fix: fix an overlap bug while limitting number of files to compact
* test: unit tests for different overlap cases of limit files to compact
* chore: increase time range of the tests to let the split files work correctly
* fix: skip compacting files of tiny ranges
* test: add tests for time range 1
* chore: address review comments
* chore: remove enable_large_size_overlap_files knob
* fix: fix a bug that sort L1 files in thier min_time instead of l0_max_created_at
* refactor: use the same order_files function afer merging main into branch
* chore: typos and clearer comments
* chore: remove obsolete comments
* chore: add asserts per review suggestion
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat(compactor2): Verify invariants for compactor2 always
* fix: update tests
* fix: update actual time range and test output
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
In one prod case the majority of this was NOT spend on creating the
child chunks. I suspect that the summary creation and the string cloning
involved in there are quite slow. So let's have slightly more detailed
tracing and see.
When combining sort keys, we have to check the schema of the chunk to
differentiate between "column does not exist within this chunk" and
"column exists but is not sorted".
This is unlikely an issue in prod at the moment (if there is not bug in
the ingester or compactor), but this was found while working on tests
for #6098. Overall this should improve robustness.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: L1 files must be sorted in their min_time if they need to split before compacting
* chore: clearer comments
* chore: Apply suggestions from code review
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* chore: run fmt after applying review suggestions
---------
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
Adds `skopeo` and gcloud CLI to CI image. This should eventually replace
our manual installation during the `deploy_releases` step:
d8d097c183/.circleci/config.yml (L493-L527)
This step costs us 20min during CD, which is ridiculous. A follow-up PR
later this week will use the CI image instead of an ad hoc installation.
Also see https://github.com/influxdata/idpe/issues/17098 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test(compactor): Add test for large amounts of data with a single timestamp
* fix: Update compactor2/tests/layouts/single_timestamp.rs
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
---------
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* refactor: remove unused `ColumnSort`
* refactor: remove invalid assertion
It is true that time SHOULD be the last sort key, but we absoletely
don't require that, esp. not in the query tier. The ingester will
currently always produce sort keys where time is last, but if we ever
going to deal w/ external data sources like bulk loaded parquet files,
this may not always be the case.
Found while constructing some edge case tests.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Ensure that a write that is added to the WAL is always attempted to be
applied to the BufferTree.
This covers off the case of a user submitting a write, waiting long
enough for it to be added to the WAL buffer, and then disconnecting
before it is added to the BufferTree (and before they get a response).
This is a minor issue on its own, but fixing it is necessary for correct
reference counting of WAL files:
https://github.com/influxdata/influxdb_iox/issues/6566
This also documents a low-risk opportunity for the WAL contents &
BufferTree to diverge, potentially leading to a crash-loop at startup:
https://github.com/influxdata/influxdb_iox/issues/7111
In practice a crash loop is unlikely, as it would require broken
invariants elsewhere (no schema validation being applied).
Closes https://github.com/influxdata/influxdb_iox/issues/6281.
* fix(compactor2): fix off by one error in time ranges of simulator
* chore: update a test that were added recently and this PR fixes it
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: NGA-TRAN <nga-tran@live.com>