I've meant to skip partitions w/ timeouts when I designed the
functionality but forgot to adjust the error filter accordingly. To not
run into this problem again (i.e. forget adjust the filter), make the
code a bit more explicit.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>
* feat: introduce scratchpad store for compactor
Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:
- **fewer IO ops:** DataFusion's streaming IO requires slightly more
IO requests (at least 2 per file) due to the way it is optimized to
read as little as possible. It first reads the metadata and then
decides which content to fetch. In the compaction case this is (esp.
w/o delete predicates) EVERYTHING. So in contrast to the querier,
there is no advantage of this approach. In contrary this easily adds
100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
run multiple compaction steps to deal with them) it is kinda pointless
to upload an intermediate result just to download it again. The
scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
DataFusion jobs because we don't wanna blow up the whole process by
having too much in-flight arrow data at the same time. However while
we perform the actual computation, we were waiting for object store
IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
implement #6645.
Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.
Closes#6650.
* fix: panic on shutdown
* refactor: remove shadow scratchpad (for now)
* refactor: make scratchpad safe to use
* test: Port a test that's not actually supported through the full gRPC API
* test: Port remaining field column/measurement fields tests
* test: Remove unsupported measurement predicate and clarify purposes of tests
Andrew confirmed that the only way to invoke a Measurement Fields
request is with a measurement/table name specified: <0249b5018e/generated_types/protos/influxdata/platform/storage/service.proto (L43)>
so testing with a `_measurement` predicate is not valid.
I thought this test would become redundant with some other tests, but
they're actually still different enough; I took this opportunity to
better highlight the differences in the test names.
* refactor: Move all measurement fields tests to their own file
* test: Remove field columns tests that are now covered in end-to-end measurement fields tests
This is only needed until we switch over to ingester2 completely.
Old ingester tests need to be run on non-shared servers because I'm
unable to implement persistence per-namespace. Rather than spending time
figuring that out, limit the parallelization to limit the Postgres
connections that CI uses at one time.
Make a new trait, `InfluxRpcTest`, that types can implement to define
how to run a test on a specific Storage gRPC API. `InfluxRpcTest` takes
care of iterating through the two architectures, running the setups, and
creating the custom test step.
Implementers of the trait can define aspects of the tests that differ
per run, to make the parameters of the test clearer and highlight what
different tests are testing.
Allows compactor2 to run a fixed-point loop (until all work is done) and
in every loop in can run mulitiple jobs.
The jobs are currently organized by "branches". This is because our
upcoming OOM handling may split a branch further if it doesn't complete.
Also note that the current config resembles the state prior to this PR.
So the FP-loop will only iterate ONCE and then runs out of L0 files. A
more advanced setup can be built using the framework though.
Ensure a "probe" node is always returned as the first candidate, driving
it to recovery faster.
This also includes a fix for the balancer metrics that would report
probe candidate nodes as healthy nodes.