Commit Graph

11185 Commits (07b7107f9aa50965c262b3a40585c05be81a6848)

Author SHA1 Message Date
Dom Dwyer f3caf604b5
refactor(wal): last batch length for preallocation
There's no need to sub 1 from the batch length to shrink the buffer over
time - the capacity of the new batch will be the length of the last. A
large batch followed by a small batch will cause the pre-allocated next
batch to be small too.
2023-03-02 11:40:38 +01:00
Dom Dwyer 0b40e0d17c
feat(wal): SequenceNumberSet for rotated file
Changes Wal::rotate() to return the SequenceNumberSet containing the IDs
of all writes in the segment file that is rotated out.
2023-03-02 10:58:03 +01:00
Dom Dwyer b22643350f
refactor(wal): track segment sequence numbers
Changes the WAL to maintain a SequenceNumberSet containing every ID
wrote to the currently open segment file.

The sets are derived from batched data for efficiency, rather than
recorded per write, to prevent any overhead in the hot path. The batch
set is merged with the file set off the hot path, in a separate I/O
thread (not the async runtime).
2023-03-02 10:58:02 +01:00
Dom Dwyer 6aa33ef380
feat: impl FromIterator for SequenceNumberSet
Allow a SequenceNumberSet to be instantiated from an iterator of
SequenceNumber.
2023-03-02 10:58:02 +01:00
Dom Dwyer 6532fb752b
feat: impl Extend for SequenceNumberSet
Allow a SequenceNumberSet to be efficiently extended from any iterator
of SequenceNumber instances.
2023-03-02 10:58:01 +01:00
dependabot[bot] c538cac4ef
chore(deps): Bump tokio from 1.25.0 to 1.26.0 (#7107)
* chore(deps): Bump tokio from 1.25.0 to 1.26.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.25.0...tokio-1.26.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:50:39 +00:00
dependabot[bot] 06b2c7a329
chore(deps): Bump sqlparser from 0.30.0 to 0.31.0 (#7108)
Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/sqlparser-rs/sqlparser-rs/releases)
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-03-02 09:43:43 +00:00
Marco Neumann c95d078e46
feat: add `NestedUnion` opt (#7092)
* docs: typo

* feat: add `NestedUnion` opt

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:09:05 +00:00
kodiakhq[bot] f3267f992a
Merge pull request #7102 from influxdata/cn/share-normalization
fix: Use the same normalization code for explain tests as e2e tests do
2023-03-02 03:18:51 +00:00
kodiakhq[bot] f730991602
Merge branch 'main' into cn/share-normalization 2023-03-02 03:12:16 +00:00
Nga Tran 04ee075a73
chore: reove folder that was aciidently added by vs code (#7106) 2023-03-02 02:43:13 +00:00
Christopher M. Wolff d1a54cf0d4
feat: allow no lower bound gap fill implementation (#7104)
* feat: allow no lower bound gap fill implementation

* chore: clippy

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 23:32:57 +00:00
Nga Tran c8b3827b20
test(compactor2): end-to-end data-tests with large overlap files (#7103)
* test: end-to-end data-tests with large overlap files

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 21:48:33 +00:00
Carol (Nichols || Goulding) 3bf0f2779e
refactor: Move query plan normalizer to arrow_util 2023-03-01 15:44:22 -05:00
Andrew Lamb 525c48de2c
feat(compactor2): add invariant checks to the compactor tests (#7096)
* feat(compactor2): adding invariant checks to the compactor tests

* fix: Update tests

* fix: remove uneeded change

* fix: filter out deleted files from invariant checks
2023-03-01 19:52:04 +00:00
Carol (Nichols || Goulding) bbfff8699c
fix: Use the same normalization code for explain tests as e2e tests do
The regex for replacing UUIDs needed to be changed like the normalizer's
regex did, so keep them in sync by using the same code.

This might point to the normalizer needing to be moved somewhere else,
or changing these tests to be e2e?
2023-03-01 13:00:04 -05:00
Nga Tran ce215c2c67
test(compactor2): layout tests for scenarios of large size overlap files (#7097)
* test: layput tests for scenarios of large size overlap files

* fix: Update compactor2/tests/layouts/large_overlaps.rs

Co-authored-by: Nga Tran <nga-tran@live.com>

* fix: Update compactor2/tests/layouts/large_overlaps.rs

Co-authored-by: Nga Tran <nga-tran@live.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 16:43:15 +00:00
kodiakhq[bot] 59c14fc6bb
Merge pull request #7074 from influxdata/cn/more-querier-tests-to-kafkaless
test: Change more querier tests to only use Kafkaless
2023-03-01 16:12:43 +00:00
kodiakhq[bot] b7170e41fb
Merge branch 'main' into cn/more-querier-tests-to-kafkaless 2023-03-01 16:05:41 +00:00
Dom 36d85bf712
Merge pull request #7091 from influxdata/dom/ingester-docs
docs: ingester overview + subsystem docs
2023-03-01 15:31:34 +00:00
Dom Dwyer a55bbebbee
perf(wal): avoid batch buffer reallocations
This change causes the WAL to pre-allocate the write batch buffer,
reducing the reallocations & copies that occur in the hot path (this
buffer can grow to be moderately large).

This should automatically size to the correct capacity and (slowly)
reduce buffer overrun.
2023-03-01 15:58:45 +01:00
Dom Dwyer 79f9411e11
fix: wal flusher task / memory leak
Although not a problem in conventional usage, leaking this task prevents
the memory used by the wal (which can be substantial) from ever being
deallocated. In turn, this prevents the WAL writer I/O thread from
stopping too.
2023-03-01 15:32:33 +01:00
Dom Dwyer bbd471718d
docs: hot partition persistence
Document the hot partition persistence configuration values.
2023-03-01 14:27:07 +01:00
Dom Dwyer cfd377d12a
refactor: use as_ for cheap conversion methods
This follows with the naming conventions of rust - cheap conversions are
prefixed "as_" and not "to_".
2023-03-01 14:27:06 +01:00
Dom Dwyer e4089acbae
docs(persist): fix-up handle usage
Previously a PersistHandle was cloned for sharing, now it is not
(shareable by wrapping in an Arc).

This fixes the documentation to reflect the change in expected usage.
2023-03-01 14:27:06 +01:00
Dom Dwyer be661890c5
docs: module-level overviews
Adds one-liner documentation of what each module contains - this is
helpful to understand what is where, when looking at the rendered docs.
2023-03-01 14:27:05 +01:00
Dom Dwyer 2a8731dd90
docs: ingester overview documentation
Adds "overview" documentation for the ingester, including the high-level
purpose & design. Each subsystem is briefly documented, with links for
jumping-off points into more specific documentation.
2023-03-01 14:27:05 +01:00
Andrew Lamb e19ce98407
chore: Update datafusion + arrow/arrow-flight/parquet to 34.0.0 (#7084)
* chore: Update datafusion + arrow/arrow-flight/parquet to 34.0.0

* chore: Run cargo hakari tasks

* chore: Update plans

* chore: Update querier expected output

* chore: Update querier tests to use insta

* fix: sort output too

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 11:25:01 +00:00
Dom f1efeae1c8
Merge pull request #7082 from influxdata/dom/router-docs
docs: update router diagrams for kafkaless
2023-03-01 10:42:14 +00:00
Dom 9ed7aa7257
Merge branch 'main' into dom/router-docs 2023-03-01 10:28:46 +00:00
dependabot[bot] 2b2c75e840
chore(deps): Bump clap from 4.1.7 to 4.1.8 (#7088)
Bumps [clap](https://github.com/clap-rs/clap) from 4.1.7 to 4.1.8.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.1.7...v4.1.8)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 10:22:52 +00:00
dependabot[bot] 90ceb2c896
chore(deps): Bump crossbeam-utils from 0.8.14 to 0.8.15 (#7089)
Bumps [crossbeam-utils](https://github.com/crossbeam-rs/crossbeam) from 0.8.14 to 0.8.15.
- [Release notes](https://github.com/crossbeam-rs/crossbeam/releases)
- [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crossbeam-rs/crossbeam/compare/crossbeam-utils-0.8.14...crossbeam-utils-0.8.15)

---
updated-dependencies:
- dependency-name: crossbeam-utils
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-03-01 10:16:09 +00:00
Marco Neumann 8f11372eac
feat: predicate pushdown phys. optimizer rule (#7083)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:44:57 +00:00
Marco Neumann 7fb562cb01
feat: "collect chunks" phys. optimizer rule (#7086)
* feat: "collect chunks" phys. optimizer rule

Required to clean up the plan a bit after all the dedup split and
removal passes.

For #6098.

* refactor: `collect` -> `combine`

* fix: submodule vis
2023-03-01 09:38:11 +00:00
Marco Neumann b85869778d
fix: `extract_chunks` schema handling (#7085)
I forgot that both `RecordBatchExec` and `ParquetExec` can have schemas
with more columns than the chunks they contain, i.e. both provide null
column creation. When extracting the schema for the chunks within a
plan, the full schemas should be preserved, otherwise the physical
optimizer rules will create invalid plan nodes (i.e. with missing
columns).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:17:31 +00:00
Christopher M. Wolff 24b9bfacb5
refactor: rewrite gap filling code to be more intuitive and extendable (#7076)
* refactor: rewrite gap filling code to be more intuitive and extendable

* chore: address clippy issue
2023-02-28 22:18:52 +00:00
Andrew Lamb f3a16a1221
feat(compactor2): add catalog upgrade information to tests (#7075)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-28 19:28:42 +00:00
Dom Dwyer 32ce2564dc
refactor: consistent validator/validation
Use a consistent name for the retention validation (matching the schema
validation module).
2023-02-28 14:45:45 +01:00
Dom Dwyer f6c9f9b0e9
docs: update router diagram / overview docs
Updates the router documentation to reflect the "kafkaless"
implementation we'll use going forward.
2023-02-28 14:45:32 +01:00
Marco Neumann 6d8fd37e26
feat: add "split dedup by time" optimizer rule (#7041)
* feat: add "split dedup by time" optimizer rule

For #6098.

* docs: fix typo

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* feat: add log messages for skipped optimizations

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-02-28 11:29:42 +00:00
Marco Neumann e8b6e66f75
feat: misc logging/tracing improvements (#7081)
* feat: print query language/variant in info log

* feat: allow overwriting trace header name in CLI
2023-02-28 11:14:42 +00:00
dependabot[bot] d474f9a8cc
chore(deps): Bump clap from 4.1.6 to 4.1.7 (#7080)
Bumps [clap](https://github.com/clap-rs/clap) from 4.1.6 to 4.1.7.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.1.6...v4.1.7)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-28 09:23:33 +00:00
Marco Neumann 04f3296d7b
feat: add "remove de-duplication" optimizer pass (#7042)
For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-28 07:57:19 +00:00
Carol (Nichols || Goulding) 312a9bb56b
test: Change more querier tests to only use Kafkaless 2023-02-27 14:20:46 -05:00
Nga Tran 22fe629f54
refactor: rename files and function to remove tartget level (#7073)
* refactor: rename files and function to remove tartget level

* chore: update a comment

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-27 19:09:37 +00:00
Andrew Lamb 26b97482df
chore(compactor2): Split up tests into smaller modules (#7072) 2023-02-27 17:53:41 +00:00
Andrew Lamb 5194999d62
feat: Use ? for id of uncreated parquet files (#7066)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-27 16:35:51 +00:00
kodiakhq[bot] 63bf663b94
Merge pull request #7056 from influxdata/cn/test-rpc-write-in-querier
test: Switch most querier tests to use the RPC write path
2023-02-27 15:24:39 +00:00
kodiakhq[bot] 731a131a85
Merge branch 'main' into cn/test-rpc-write-in-querier 2023-02-27 15:17:51 +00:00
Andrew Lamb dd5d4f4435
chore(compactor2): document and test `split_percentage` and `percentage_max_file_size` knobs (#7026)
* chore: document and test split_percentage and percentage_max_file_size

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: add test with both max file size and split percentage

* docs: whitespace engineering and small typo

---------

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-02-27 15:01:06 +00:00