Marco Neumann
c51548f28b
refactor: improve concurrency during parquet chunk creation ( #6376 )
...
* refactor: de-correletate parquet file processing
* refactor: increase concurrent chunk creation jobs to 100 (from 10)
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: use deterministic RNG
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 16:16:09 +00:00
kodiakhq[bot]
34edbae6d7
Merge pull request #6393 from influxdata/dom/remove-queue
...
fix(ingester2): persist deadlock
2022-12-13 16:08:01 +00:00
Dom Dwyer
8f0da90d76
docs: remove ref to PersistActor
...
Fix bad reflink to something that no longer exists.
2022-12-13 16:59:15 +01:00
Dom Dwyer
309386b828
chore: silence spurious lint
...
This is by design! Clippy just doesn't see the plan.
2022-12-13 16:59:14 +01:00
Dom Dwyer
1da9b63cce
fix(ingester2): persist deadlock
...
Removes the submission queue from the persist fan-out, instead the
PersistHandle now carries the shared state internally (cheaply cloned
via ref counts).
This also resolves the persist deadlock when under load.
2022-12-13 16:47:45 +01:00
kodiakhq[bot]
e81d078f3c
Merge pull request #6377 from influxdata/dom/wal-bench
...
test(ingester2): WAL replay benchmark
2022-12-13 15:27:53 +00:00
kodiakhq[bot]
9e8ae1485f
Merge branch 'main' into dom/wal-bench
2022-12-13 15:19:32 +00:00
kodiakhq[bot]
d9c9865297
Merge pull request #6386 from influxdata/dom/persist-logging
...
feat(ingester2): log persist active & queue timings
2022-12-13 15:19:05 +00:00
kodiakhq[bot]
cff3d3528d
Merge branch 'main' into dom/persist-logging
2022-12-13 15:11:10 +00:00
kodiakhq[bot]
e5b813c84f
Merge pull request #6387 from influxdata/dom/editor-config
...
chore: editor config spacing for shell scripts
2022-12-13 10:41:15 +00:00
Dom Dwyer
65d45fbe91
chore: editor config spacing for shell scripts
...
Set .bash and .sh script indent size to 4.
2022-12-13 11:12:11 +01:00
Dom Dwyer
5fa4e49098
feat(ingester2): persist active & queue timings
...
Adds more debug logging to the persist code paths, as well as capturing
& logging (at INFO) timing information tracking the time a persist task
spends in the queue, the active time spent actually persisting the data,
and the total duration of time since the request was created (sum of
both durations).
2022-12-13 11:06:09 +01:00
dependabot[bot]
e108a8b6c9
chore(deps): Bump paste from 1.0.9 to 1.0.10 ( #6384 )
...
Bumps [paste](https://github.com/dtolnay/paste ) from 1.0.9 to 1.0.10.
- [Release notes](https://github.com/dtolnay/paste/releases )
- [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10 )
---
updated-dependencies:
- dependency-name: paste
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 06:03:05 +00:00
Stuart Carnie
f56b834438
chore: Implemented ZeroOrMore item container ( #6373 )
...
* chore: Implemented ZeroOrMore item container
Closes #6372
* chore: Use canonical names based on feedback
2022-12-12 22:01:30 +00:00
Carol (Nichols || Goulding)
fdbf9e112e
fix: Actually switch into rpc_write mode in querier
...
Only when the feature flag is set *and* --ingester-addresses is set. I
had documented that intention, but didn't actually implement it
correctly.
2022-12-12 16:37:11 -05:00
Carol (Nichols || Goulding)
44c3486db0
feat: Expire the querier's cache using info from ingester2
...
Fixes #6335 .
For each table, keep track of the ingester UUIDs and associated
persisted Parquet file counts that we've seen from previous requests to
ingesters. When doing a query, determine if we should expire the Parquet
file catalog cache by looking at the new information from the ingesters.
If we see a new ingester UUID or if the number of persisted files for a
known ingester UUID is different than what we've stored, then we should
expire this table's Parquet file cache.
Either way, incorporate the new information into the saved values for
comparing with the next request.
2022-12-12 15:53:39 -05:00
Carol (Nichols || Goulding)
b4b50d7dc1
feat: Collect the ingester UUIDs and persistence counts in the table
...
And pass them to the parquet file cache, which doesn't use them yet.
2022-12-12 15:52:56 -05:00
Carol (Nichols || Goulding)
b0ba171742
feat: Keep track of ingester UUIDs and counts in IngesterPartition
2022-12-12 15:52:08 -05:00
Carol (Nichols || Goulding)
9c8b55c5be
docs: Fix some wrapping/typos in comments
2022-12-12 14:30:52 -05:00
Carol (Nichols || Goulding)
1c7f322a4e
feat: Keep track of and report number of Parquet files persisted
...
Per partition and starting over each time the ingester restarts.
Fixes #6334 .
2022-12-12 11:45:00 -05:00
Carol (Nichols || Goulding)
33886970ef
refactor: Extract a helper fn for test messages
...
Reduces duplication, makes it easier to see what's different between the
tests, will make it easier to add another field in the next commit
2022-12-12 11:45:00 -05:00
kodiakhq[bot]
e91d8998a8
Merge pull request #6357 from influxdata/cn/ingester2-uuid
...
feat: Identify each run of an ingester with a Uuid
2022-12-12 16:29:04 +00:00
kodiakhq[bot]
727efcbdee
Merge branch 'main' into cn/ingester2-uuid
2022-12-12 16:21:15 +00:00
Marco Neumann
e49ffc02f8
refactor: faster sort key calculation ( #6375 )
...
Avoid nasty string lookups to dermine which columns make a parquet's
sort key.
For #6358 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 15:32:04 +00:00
Andrew Lamb
336ca761a3
chore: Update datafusion pin (to get sqlparser update) ( #6378 )
...
* chore: Update datafusion pin
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-12 14:39:42 +00:00
Marco Neumann
6b1c43f01e
refactor: use column IDs for partition cache invalidation ( #6374 )
...
This shall avoid a bunch of string hashing during query planning.
For #6358 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 14:22:28 +00:00
Dom Dwyer
7c28a30d1b
test(ingester2): WAL replay benchmark
...
This adds a simple WAL replay benchmark to ingester2 that executes a
replay of a single line of LP.
Unfortunately each file in the benches directory is compiled as it's own
binary/crate, and as such is restricted to importing only "pub" types.
This sucks, as it requires you to either benchmark at a high level
(macro, not microbenchmarks - i.e. benchmarking the ingester startup,
not just the WAL replay) or you are forced to mark the reliant types &
functions as "pub", as well as all the other types/traits they reference
in their signatures. Because the performance sensitive code is usually
towards the lower end of the call stack, this can quickly lead to an
explosion of "pub" types causing a large amount of internal code to be
exported.
Instead this commit uses a middle-ground; benchmarked types & fns are
conditionally marked as "pub" iff the "benches" feature is enabled. This
prevents them from being visible by default, but allows the benchmark
function to call them.
The benchmark itself is also restricted to only run when this feature is
enabled.
2022-12-12 15:02:36 +01:00
Andrew Lamb
e5322b24b9
feat: Add --token CLI argument, improve update docs about writing ( #6356 )
...
* feat: Add --token CLI argument, improve update docs about writing
* fix: support environment tokens too
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 13:43:15 +00:00
Andrew Lamb
e0ecacf6cc
chore: Update DataFusion (get median fix and automatic string to timestamp coercion) ( #6363 )
...
* chore: Update DataFusion pin to get median fix
* chore: Update for new Expr node
* test: add test for median
* test: add test for coercion of strings to timestamps
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 12:14:00 +00:00
dependabot[bot]
95969ad24f
chore(deps): Bump base64 from 0.13.1 to 0.20.0 ( #6371 )
...
* chore(deps): Bump base64 from 0.13.1 to 0.20.0
Bumps [base64](https://github.com/marshallpierce/rust-base64 ) from 0.13.1 to 0.20.0.
- [Release notes](https://github.com/marshallpierce/rust-base64/releases )
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md )
- [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.13.1...v0.20.0 )
---
updated-dependencies:
- dependency-name: base64
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 07:07:19 +00:00
dependabot[bot]
9305e4c566
chore(deps): Bump insta from 1.22.0 to 1.23.0 ( #6370 )
...
Bumps [insta](https://github.com/mitsuhiko/insta ) from 1.22.0 to 1.23.0.
- [Release notes](https://github.com/mitsuhiko/insta/releases )
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md )
- [Commits](https://github.com/mitsuhiko/insta/compare/1.22.0...1.23.0 )
---
updated-dependencies:
- dependency-name: insta
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-12 06:48:35 +00:00
dependabot[bot]
a66895ecdc
chore(deps): Bump serde from 1.0.149 to 1.0.150 ( #6369 )
...
Bumps [serde](https://github.com/serde-rs/serde ) from 1.0.149 to 1.0.150.
- [Release notes](https://github.com/serde-rs/serde/releases )
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.149...v1.0.150 )
---
updated-dependencies:
- dependency-name: serde
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-12 06:35:16 +00:00
Raphael Taylor-Davies
061d582a9b
chore: patch object_store to get apache#3274 ( #6362 )
...
* chore: patch object_store to get apache#3274
* chore: Run cargo hakari tasks
* fix: add issue breadcrumb
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-09 14:52:25 +00:00
Marco Neumann
db933c44b6
refactor: store reverse column ID map for cached tables ( #6360 )
2022-12-09 11:58:24 +00:00
Marco Neumann
450b452148
refactor: avoid string-hashing of parquet file column names ( #6359 )
2022-12-09 11:51:18 +00:00
Marco Neumann
0221820123
feat: rate-limit Jaeger UDP messages ( #6354 )
...
* feat: rate-limit Jaeger UDP messages
The Jaeger UDP protocol provides no way to signal backpressure /
overload. In certain situations, we are emitting that many tracing spans
in a short period of time that the OS, the network, or Jaeger drop them.
While a rate limit is not a perfect solution, it for sure helps a lot
(tested locally).
Note that the limiter does NOT lead to unlimited buffering because we
already have a limited outbox queue in place (see
`trace_exporters::export::CHANNEL_SIZE`).
Fixes #5446 .
* fix: only warn ones when the tracing channel is full
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-09 09:46:07 +00:00
Carol (Nichols || Goulding)
c3a7575d46
feat: Enable rpc_write on the inner command if it's enabled for tests
...
And only run rpc_write specific tests if the feature is enabled when
running the tests.
2022-12-08 17:45:30 -05:00
Carol (Nichols || Goulding)
0a4df1f3fb
chore: Run tests in CI in both RPC write mode and not
2022-12-08 17:40:04 -05:00
Carol (Nichols || Goulding)
5141cba1db
fix: Only switch into querier RPC write path if ingester addresses specified
...
This enables testing of the querier using the old path with the
rpc_write feature turned on.
2022-12-08 17:40:04 -05:00
Carol (Nichols || Goulding)
b85130cb7c
fix: Make --ingester-addresses optional for the querier in RPC write mode
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding)
2fd2d05ef6
feat: Identify each run of an ingester with a Uuid
...
And send that UUID in the Flight response for queries to that ingester
run.
Fixes #6333 .
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding)
6014c10866
test: Enable running ingester2/router RPC write servers in e2e tests
...
Add configuration and server types to be able to create server fixtures
for them.
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding)
62db312a8f
feat: Switch to escargot to get more control over running Cargo bins
2022-12-08 15:29:44 -05:00
Carol (Nichols || Goulding)
619a2d0856
fix: Remove conflicting arguments from the RouterRpcWriteConfig ( #6355 )
...
These were added in
https://github.com/influxdata/influxdb_iox/pull/6346 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 20:21:37 +00:00
Marco Neumann
4ded68de62
test: "not found" end2end tests for querier ( #6352 )
...
I couldn't find any end2end tests for these cases and I was kinda
worried that our error codes were wrong. Turns out they are correct, but
let's have some nice tests for this behavior.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 18:17:53 +00:00
kodiakhq[bot]
64aae97ce7
Merge pull request #6337 from influxdata/cn/ingester2-querier
...
feat: Make a mode for the querier to use ingester2 instead, behind the rpc_write feature flag
2022-12-08 14:07:36 +00:00
kodiakhq[bot]
6f7cb5ccf0
Merge branch 'main' into cn/ingester2-querier
2022-12-08 14:00:49 +00:00
Marco Neumann
d4e321a2bd
refactor: add additional span around chunk spans ( #6353 )
...
* refactor: add additional span around chunk spans
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-08 13:57:32 +00:00
Andrew Lamb
9175f4a0b5
chore: Upgrade datafusion to get correct support for multi-part identifiers ( #6349 )
...
* test: add tests for periods in measurement names
* chore: Update Datafusion
* chore: Update for changed APIs
* chore: Update expected plan output
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 11:27:13 +00:00
Marco Neumann
c25afda6cc
fix: `GroupGenerator`/`Converter` panic ( #6351 )
...
Do not poll a ready future.
2022-12-08 11:08:21 +00:00