* feat: Updating to new services for all-in-one
* fix: Use correct shard id for ingester2
* fix: clippy
* fix: use wal directory
* fix: end to end tests
* fix: Update tracing cases for new ingest reality
* fix: update metrics test
* fix: Use rpc mode
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: Reading file error reported the wrong path
When the `.expected` SQL file couldn't be found, this error reported
the input file path instead.
* test: Port SQL query_tests to end-to-end tests
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
These two tests weren't actually ensuring that the combination of these
predicates worked, because the tests would still pass if some of the
predicate parts were removed.
* refactor: Start a new file for tag keys tests; move the one existing test
* test: Port tag_keys query_tests to end-to-end tests
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: Start a new file for measurement names tests; move the one existing test
* fix: Pass on predicate when sending a measurement names request with GrpcRequestBuilder
* feat: Support literal integer queries too
* test: Port measurement_names/table_names query_tests to end-to-end tests
* fix: merge conflict error
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: Delete the read filter *file*; last PR only deleted the *contents*
* test: Port read_group query_tests to end-to-end tests
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Move sql script files from query_tests and into end to end query tests
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>
* feat: introduce scratchpad store for compactor
Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:
- **fewer IO ops:** DataFusion's streaming IO requires slightly more
IO requests (at least 2 per file) due to the way it is optimized to
read as little as possible. It first reads the metadata and then
decides which content to fetch. In the compaction case this is (esp.
w/o delete predicates) EVERYTHING. So in contrast to the querier,
there is no advantage of this approach. In contrary this easily adds
100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
run multiple compaction steps to deal with them) it is kinda pointless
to upload an intermediate result just to download it again. The
scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
DataFusion jobs because we don't wanna blow up the whole process by
having too much in-flight arrow data at the same time. However while
we perform the actual computation, we were waiting for object store
IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
implement #6645.
Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.
Closes#6650.
* fix: panic on shutdown
* refactor: remove shadow scratchpad (for now)
* refactor: make scratchpad safe to use
* test: Port a test that's not actually supported through the full gRPC API
* test: Port remaining field column/measurement fields tests
* test: Remove unsupported measurement predicate and clarify purposes of tests
Andrew confirmed that the only way to invoke a Measurement Fields
request is with a measurement/table name specified: <0249b5018e/generated_types/protos/influxdata/platform/storage/service.proto (L43)>
so testing with a `_measurement` predicate is not valid.
I thought this test would become redundant with some other tests, but
they're actually still different enough; I took this opportunity to
better highlight the differences in the test names.
* refactor: Move all measurement fields tests to their own file
* test: Remove field columns tests that are now covered in end-to-end measurement fields tests
Make a new trait, `InfluxRpcTest`, that types can implement to define
how to run a test on a specific Storage gRPC API. `InfluxRpcTest` takes
care of iterating through the two architectures, running the setups, and
creating the custom test step.
Implementers of the trait can define aspects of the tests that differ
per run, to make the parameters of the test clearer and highlight what
different tests are testing.
- use a single data structure for CLI args (not two)
- set mem limit default to 8GB (same as querier). We can always tune
this later, but we should not run with "unlimited" to begin with.
Sets up crate and wires up the main binary. No tests yet, no algorithm
framework, just the bare minimum.
Also I decided to not offer a gRPC server in `compactor2` at the moment
and hence did not implement any handle/delegate infrastructure. We add
this later if we need it. This also means compactor2 does NOT provide a
catalog service for now.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>