Commit Graph

10181 Commits (07772e8d2254fb734e7f826298559658a4964015)

Author SHA1 Message Date
Marco Neumann c51548f28b
refactor: improve concurrency during parquet chunk creation (#6376)
* refactor: de-correletate parquet file processing

* refactor: increase concurrent chunk creation jobs to 100 (from 10)

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: use deterministic RNG

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 16:16:09 +00:00
kodiakhq[bot] 34edbae6d7
Merge pull request #6393 from influxdata/dom/remove-queue
fix(ingester2): persist deadlock
2022-12-13 16:08:01 +00:00
Dom Dwyer 8f0da90d76
docs: remove ref to PersistActor
Fix bad reflink to something that no longer exists.
2022-12-13 16:59:15 +01:00
Dom Dwyer 309386b828
chore: silence spurious lint
This is by design! Clippy just doesn't see the plan.
2022-12-13 16:59:14 +01:00
Dom Dwyer 1da9b63cce
fix(ingester2): persist deadlock
Removes the submission queue from the persist fan-out, instead the
PersistHandle now carries the shared state internally (cheaply cloned
via ref counts).

This also resolves the persist deadlock when under load.
2022-12-13 16:47:45 +01:00
kodiakhq[bot] e81d078f3c
Merge pull request #6377 from influxdata/dom/wal-bench
test(ingester2): WAL replay benchmark
2022-12-13 15:27:53 +00:00
kodiakhq[bot] 9e8ae1485f
Merge branch 'main' into dom/wal-bench 2022-12-13 15:19:32 +00:00
kodiakhq[bot] d9c9865297
Merge pull request #6386 from influxdata/dom/persist-logging
feat(ingester2): log persist active & queue timings
2022-12-13 15:19:05 +00:00
kodiakhq[bot] cff3d3528d
Merge branch 'main' into dom/persist-logging 2022-12-13 15:11:10 +00:00
kodiakhq[bot] e5b813c84f
Merge pull request #6387 from influxdata/dom/editor-config
chore: editor config spacing for shell scripts
2022-12-13 10:41:15 +00:00
Dom Dwyer 65d45fbe91
chore: editor config spacing for shell scripts
Set .bash and .sh script indent size to 4.
2022-12-13 11:12:11 +01:00
Dom Dwyer 5fa4e49098
feat(ingester2): persist active & queue timings
Adds more debug logging to the persist code paths, as well as capturing
& logging (at INFO) timing information tracking the time a persist task
spends in the queue, the active time spent actually persisting the data,
and the total duration of time since the request was created (sum of
both durations).
2022-12-13 11:06:09 +01:00
dependabot[bot] e108a8b6c9
chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 06:03:05 +00:00
Stuart Carnie f56b834438
chore: Implemented ZeroOrMore item container (#6373)
* chore: Implemented ZeroOrMore item container

Closes #6372

* chore: Use canonical names based on feedback
2022-12-12 22:01:30 +00:00
Carol (Nichols || Goulding) fdbf9e112e
fix: Actually switch into rpc_write mode in querier
Only when the feature flag is set *and* --ingester-addresses is set. I
had documented that intention, but didn't actually implement it
correctly.
2022-12-12 16:37:11 -05:00
Carol (Nichols || Goulding) 44c3486db0
feat: Expire the querier's cache using info from ingester2
Fixes #6335.

For each table, keep track of the ingester UUIDs and associated
persisted Parquet file counts that we've seen from previous requests to
ingesters. When doing a query, determine if we should expire the Parquet
file catalog cache by looking at the new information from the ingesters.

If we see a new ingester UUID or if the number of persisted files for a
known ingester UUID is different than what we've stored, then we should
expire this table's Parquet file cache.

Either way, incorporate the new information into the saved values for
comparing with the next request.
2022-12-12 15:53:39 -05:00
Carol (Nichols || Goulding) b4b50d7dc1
feat: Collect the ingester UUIDs and persistence counts in the table
And pass them to the parquet file cache, which doesn't use them yet.
2022-12-12 15:52:56 -05:00
Carol (Nichols || Goulding) b0ba171742
feat: Keep track of ingester UUIDs and counts in IngesterPartition 2022-12-12 15:52:08 -05:00
Carol (Nichols || Goulding) 9c8b55c5be
docs: Fix some wrapping/typos in comments 2022-12-12 14:30:52 -05:00
Carol (Nichols || Goulding) 1c7f322a4e
feat: Keep track of and report number of Parquet files persisted
Per partition and starting over each time the ingester restarts.

Fixes #6334.
2022-12-12 11:45:00 -05:00
Carol (Nichols || Goulding) 33886970ef
refactor: Extract a helper fn for test messages
Reduces duplication, makes it easier to see what's different between the
tests, will make it easier to add another field in the next commit
2022-12-12 11:45:00 -05:00
kodiakhq[bot] e91d8998a8
Merge pull request #6357 from influxdata/cn/ingester2-uuid
feat: Identify each run of an ingester with a Uuid
2022-12-12 16:29:04 +00:00
kodiakhq[bot] 727efcbdee
Merge branch 'main' into cn/ingester2-uuid 2022-12-12 16:21:15 +00:00
Marco Neumann e49ffc02f8
refactor: faster sort key calculation (#6375)
Avoid nasty string lookups to dermine which columns make a parquet's
sort key.

For #6358.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 15:32:04 +00:00
Andrew Lamb 336ca761a3
chore: Update datafusion pin (to get sqlparser update) (#6378)
* chore: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-12 14:39:42 +00:00
Marco Neumann 6b1c43f01e
refactor: use column IDs for partition cache invalidation (#6374)
This shall avoid a bunch of string hashing during query planning.

For #6358.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 14:22:28 +00:00
Dom Dwyer 7c28a30d1b
test(ingester2): WAL replay benchmark
This adds a simple WAL replay benchmark to ingester2 that executes a
replay of a single line of LP.

Unfortunately each file in the benches directory is compiled as it's own
binary/crate, and as such is restricted to importing only "pub" types.
This sucks, as it requires you to either benchmark at a high level
(macro, not microbenchmarks - i.e. benchmarking the ingester startup,
not just the WAL replay) or you are forced to mark the reliant types &
functions as "pub", as well as all the other types/traits they reference
in their signatures. Because the performance sensitive code is usually
towards the lower end of the call stack, this can quickly lead to an
explosion of "pub" types causing a large amount of internal code to be
exported.

Instead this commit uses a middle-ground; benchmarked types & fns are
conditionally marked as "pub" iff the "benches" feature is enabled. This
prevents them from being visible by default, but allows the benchmark
function to call them.

The benchmark itself is also restricted to only run when this feature is
enabled.
2022-12-12 15:02:36 +01:00
Andrew Lamb e5322b24b9
feat: Add --token CLI argument, improve update docs about writing (#6356)
* feat: Add --token CLI argument, improve update docs about writing

* fix: support environment tokens too

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 13:43:15 +00:00
Andrew Lamb e0ecacf6cc
chore: Update DataFusion (get median fix and automatic string to timestamp coercion) (#6363)
* chore: Update DataFusion pin to get median fix

* chore: Update for new Expr node

* test: add test for median

* test: add test for coercion of strings to timestamps

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 12:14:00 +00:00
dependabot[bot] 95969ad24f
chore(deps): Bump base64 from 0.13.1 to 0.20.0 (#6371)
* chore(deps): Bump base64 from 0.13.1 to 0.20.0

Bumps [base64](https://github.com/marshallpierce/rust-base64) from 0.13.1 to 0.20.0.
- [Release notes](https://github.com/marshallpierce/rust-base64/releases)
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md)
- [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.13.1...v0.20.0)

---
updated-dependencies:
- dependency-name: base64
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-12 07:07:19 +00:00
dependabot[bot] 9305e4c566
chore(deps): Bump insta from 1.22.0 to 1.23.0 (#6370)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.22.0 to 1.23.0.
- [Release notes](https://github.com/mitsuhiko/insta/releases)
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.22.0...1.23.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-12 06:48:35 +00:00
dependabot[bot] a66895ecdc
chore(deps): Bump serde from 1.0.149 to 1.0.150 (#6369)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.149 to 1.0.150.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.149...v1.0.150)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-12 06:35:16 +00:00
Raphael Taylor-Davies 061d582a9b
chore: patch object_store to get apache#3274 (#6362)
* chore: patch object_store to get apache#3274

* chore: Run cargo hakari tasks

* fix: add issue breadcrumb

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-09 14:52:25 +00:00
Marco Neumann db933c44b6
refactor: store reverse column ID map for cached tables (#6360) 2022-12-09 11:58:24 +00:00
Marco Neumann 450b452148
refactor: avoid string-hashing of parquet file column names (#6359) 2022-12-09 11:51:18 +00:00
Marco Neumann 0221820123
feat: rate-limit Jaeger UDP messages (#6354)
* feat: rate-limit Jaeger UDP messages

The Jaeger UDP protocol provides no way to signal backpressure /
overload. In certain situations, we are emitting that many tracing spans
in a short period of time that the OS, the network, or Jaeger drop them.
While a rate limit is not a perfect solution, it for sure helps a lot
(tested locally).

Note that the limiter does NOT lead to unlimited buffering because we
already have a limited outbox queue in place (see
`trace_exporters::export::CHANNEL_SIZE`).

Fixes #5446.

* fix: only warn ones when the tracing channel is full

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-09 09:46:07 +00:00
Carol (Nichols || Goulding) c3a7575d46
feat: Enable rpc_write on the inner command if it's enabled for tests
And only run rpc_write specific tests if the feature is enabled when
running the tests.
2022-12-08 17:45:30 -05:00
Carol (Nichols || Goulding) 0a4df1f3fb
chore: Run tests in CI in both RPC write mode and not 2022-12-08 17:40:04 -05:00
Carol (Nichols || Goulding) 5141cba1db
fix: Only switch into querier RPC write path if ingester addresses specified
This enables testing of the querier using the old path with the
rpc_write feature turned on.
2022-12-08 17:40:04 -05:00
Carol (Nichols || Goulding) b85130cb7c
fix: Make --ingester-addresses optional for the querier in RPC write mode 2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding) 2fd2d05ef6
feat: Identify each run of an ingester with a Uuid
And send that UUID in the Flight response for queries to that ingester
run.

Fixes #6333.
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding) 6014c10866
test: Enable running ingester2/router RPC write servers in e2e tests
Add configuration and server types to be able to create server fixtures
for them.
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding) 62db312a8f
feat: Switch to escargot to get more control over running Cargo bins 2022-12-08 15:29:44 -05:00
Carol (Nichols || Goulding) 619a2d0856
fix: Remove conflicting arguments from the RouterRpcWriteConfig (#6355)
These were added in
https://github.com/influxdata/influxdb_iox/pull/6346.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 20:21:37 +00:00
Marco Neumann 4ded68de62
test: "not found" end2end tests for querier (#6352)
I couldn't find any end2end tests for these cases and I was kinda
worried that our error codes were wrong. Turns out they are correct, but
let's have some nice tests for this behavior.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 18:17:53 +00:00
kodiakhq[bot] 64aae97ce7
Merge pull request #6337 from influxdata/cn/ingester2-querier
feat: Make a mode for the querier to use ingester2 instead, behind the rpc_write feature flag
2022-12-08 14:07:36 +00:00
kodiakhq[bot] 6f7cb5ccf0
Merge branch 'main' into cn/ingester2-querier 2022-12-08 14:00:49 +00:00
Marco Neumann d4e321a2bd
refactor: add additional span around chunk spans (#6353)
* refactor: add additional span around chunk spans

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-08 13:57:32 +00:00
Andrew Lamb 9175f4a0b5
chore: Upgrade datafusion to get correct support for multi-part identifiers (#6349)
* test: add tests for periods in measurement names

* chore: Update Datafusion

* chore: Update for changed APIs

* chore: Update expected plan output

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 11:27:13 +00:00
Marco Neumann c25afda6cc
fix: `GroupGenerator`/`Converter` panic (#6351)
Do not poll a ready future.
2022-12-08 11:08:21 +00:00