Updates the data generator to handle failed requests. Adds some println output to show progress along the way.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Removes the submission queue from the persist fan-out, instead the
PersistHandle now carries the shared state internally (cheaply cloned
via ref counts).
This also resolves the persist deadlock when under load.
Adds more debug logging to the persist code paths, as well as capturing
& logging (at INFO) timing information tracking the time a persist task
spends in the queue, the active time spent actually persisting the data,
and the total duration of time since the request was created (sum of
both durations).
Fixes#6335.
For each table, keep track of the ingester UUIDs and associated
persisted Parquet file counts that we've seen from previous requests to
ingesters. When doing a query, determine if we should expire the Parquet
file catalog cache by looking at the new information from the ingesters.
If we see a new ingester UUID or if the number of persisted files for a
known ingester UUID is different than what we've stored, then we should
expire this table's Parquet file cache.
Either way, incorporate the new information into the saved values for
comparing with the next request.
Avoid nasty string lookups to dermine which columns make a parquet's
sort key.
For #6358.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This shall avoid a bunch of string hashing during query planning.
For #6358.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This adds a simple WAL replay benchmark to ingester2 that executes a
replay of a single line of LP.
Unfortunately each file in the benches directory is compiled as it's own
binary/crate, and as such is restricted to importing only "pub" types.
This sucks, as it requires you to either benchmark at a high level
(macro, not microbenchmarks - i.e. benchmarking the ingester startup,
not just the WAL replay) or you are forced to mark the reliant types &
functions as "pub", as well as all the other types/traits they reference
in their signatures. Because the performance sensitive code is usually
towards the lower end of the call stack, this can quickly lead to an
explosion of "pub" types causing a large amount of internal code to be
exported.
Instead this commit uses a middle-ground; benchmarked types & fns are
conditionally marked as "pub" iff the "benches" feature is enabled. This
prevents them from being visible by default, but allows the benchmark
function to call them.
The benchmark itself is also restricted to only run when this feature is
enabled.
* chore: Update DataFusion pin to get median fix
* chore: Update for new Expr node
* test: add test for median
* test: add test for coercion of strings to timestamps
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: rate-limit Jaeger UDP messages
The Jaeger UDP protocol provides no way to signal backpressure /
overload. In certain situations, we are emitting that many tracing spans
in a short period of time that the OS, the network, or Jaeger drop them.
While a rate limit is not a perfect solution, it for sure helps a lot
(tested locally).
Note that the limiter does NOT lead to unlimited buffering because we
already have a limited outbox queue in place (see
`trace_exporters::export::CHANNEL_SIZE`).
Fixes#5446.
* fix: only warn ones when the tracing channel is full
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
I couldn't find any end2end tests for these cases and I was kinda
worried that our error codes were wrong. Turns out they are correct, but
let's have some nice tests for this behavior.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>