Commit Graph

10616 Commits (105e3542991aef5f1654fe260f9fa53a32f622f2)

Author SHA1 Message Date
Dom Dwyer 105e354299
refactor: clean up namespace errors
The namespace error was poorly refactored and duplicated the prefix
string. The "rejected" case is now also tested.
2023-01-26 17:32:11 +01:00
Dom Dwyer 3a9b5a4d29
fix: bind NamespaceService to gRPC server
I forgot to bind the service!
2023-01-26 17:32:11 +01:00
Dom Dwyer 1a7679bcee
refactor: expose underlying gRPC implementations
Changes the gRPC delegate to return the underlying service (type erased)
implementations instead of the RPC service wrappers.
2023-01-26 17:32:11 +01:00
Dom Dwyer ac8fa293cb
refactor(test): TestContext::write_lp() helper
Adds a helper method to construct the HTTP write request.
2023-01-26 17:32:10 +01:00
Dom Dwyer 6f1869f9dc
test(router): initialise gRPC delegate in e2e
Initialise the "rpc mode" gRPC handlers in the router e2e TestContext.
2023-01-26 17:32:10 +01:00
Dom Dwyer 3efc42baac
refactor(test): dedicated e2e TestContext module
Moves the router's TestContext to its own file/module.
2023-01-26 17:32:10 +01:00
Marco Neumann 4391e30d2d
feat: improve compactor2 debugging (#6718)
* feat: add planning logging wrapper

* refactor: split partitionS source and partition source into two components
2023-01-26 16:10:20 +00:00
Marco Neumann 68380a32e5
fix: "timeout" as a reason to skip a partition (#6716)
I've meant to skip partitions w/ timeouts when I designed the
functionality but forgot to adjust the error filter accordingly. To not
run into this problem again (i.e. forget adjust the filter), make the
code a bit more explicit.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-26 15:00:13 +00:00
Dom 8686b559ae
Merge pull request #6715 from influxdata/dom/rpc-namespace
fix(router): restore NamespaceService
2023-01-26 14:38:47 +00:00
Dom a489d03e1b
Merge branch 'main' into dom/rpc-namespace 2023-01-26 14:23:36 +00:00
Marco Neumann 30d411dc95
feat: shadow mode (#6712)
* refactor: remove untyped durations from `compactor2`

* feat: shadow mode

Closes #6645.

* refactor: split input and output store
2023-01-26 14:20:55 +00:00
Dom Dwyer c66f4a3d92
fix(router): restore NamespaceService
This was removed in the RPC variant of the router - no idea why, we
definitely should have it!
2023-01-26 15:10:22 +01:00
Dom 9b538f2c20
Merge pull request #6711 from influxdata/alamb/fix_encoding_again
fix: Do not send dictionary encoded data to clients
2023-01-26 12:29:36 +00:00
Andrew Lamb 6a0429584a fix: update doc example 2023-01-26 06:59:34 -05:00
Andrew Lamb c100737a81 chore: Do not send dictionary encoded data to clients 2023-01-26 06:35:15 -05:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
Marco Neumann ed694d3be4
feat: introduce scratchpad store for compactor (#6706)
* feat: introduce scratchpad store for compactor

Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:

- **fewer IO ops:** DataFusion's streaming IO requires slightly more
  IO requests (at least 2 per file) due to the way it is optimized to
  read as little as possible. It first reads the metadata and then
  decides which content to fetch. In the compaction case this is (esp.
  w/o delete predicates) EVERYTHING. So in contrast to the querier,
  there is no advantage of this approach. In contrary this easily adds
  100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
  run multiple compaction steps to deal with them) it is kinda pointless
  to upload an intermediate result just to download it again. The
  scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
  DataFusion jobs because we don't wanna blow up the whole process by
  having too much in-flight arrow data at the same time. However while
  we perform the actual computation, we were waiting for object store
  IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
  implement #6645.

Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.

Closes #6650.

* fix: panic on shutdown

* refactor: remove shadow scratchpad (for now)

* refactor: make scratchpad safe to use
2023-01-26 10:03:08 +00:00
Andrew Lamb 7853a19953
feat: JDBC integration tests with FlightSQL (#6693)
* feat: basic JDBC integration test

* fix: do not run test without env set

* docs: add maven link

* refactor: clean up java with switch statement
2023-01-25 22:21:18 +00:00
Andrew Lamb 2db8443a64
refactor: split flightsql crate into smaller modules (#6703)
* refactor: split flightsql crate into smaller modules

* refactor: automatically derive from Impl
2023-01-25 21:12:48 +00:00
Carol (Nichols || Goulding) 57b5b639d6
test: Port all field columns query_tests to end-to-end tests (#6707)
* test: Port a test that's not actually supported through the full gRPC API

* test: Port remaining field column/measurement fields tests

* test: Remove unsupported measurement predicate and clarify purposes of tests

Andrew confirmed that the only way to invoke a Measurement Fields
request is with a measurement/table name specified: <0249b5018e/generated_types/protos/influxdata/platform/storage/service.proto (L43)>

so testing with a `_measurement` predicate is not valid.

I thought this test would become redundant with some other tests, but
they're actually still different enough; I took this opportunity to
better highlight the differences in the test names.

* refactor: Move all measurement fields tests to their own file

* test: Remove field columns tests that are now covered in end-to-end measurement fields tests
2023-01-25 19:49:29 +00:00
kodiakhq[bot] 0249b5018e
Merge pull request #6655 from influxdata/cn/one-test
test: Start of porting InfluxRpc query_tests
2023-01-25 15:56:44 +00:00
kodiakhq[bot] 98c60f9dc5
Merge branch 'main' into cn/one-test 2023-01-25 15:49:51 +00:00
Dom 7c7d737d0e
Merge pull request #6702 from influxdata/dom/persist-enqueue-durations
refactor: appropriate queue wait histogram buckets
2023-01-25 15:49:14 +00:00
Carol (Nichols || Goulding) f803c31e84
fix: Limit tests in CI to 8 threads to not use up Postgres connections
This is only needed until we switch over to ingester2 completely.

Old ingester tests need to be run on non-shared servers because I'm
unable to implement persistence per-namespace. Rather than spending time
figuring that out, limit the parallelization to limit the Postgres
connections that CI uses at one time.
2023-01-25 10:37:05 -05:00
Carol (Nichols || Goulding) 4658510102
fix: For Ingester2, persist a particular namespace on demand and share MiniClusters
This should hopefully help CI from running out of Postgres
connections 😬

The old architecture will still need to be non-shared and persist
everything.
2023-01-25 10:36:56 -05:00
Dom Dwyer df87ca3f17
refactor: appropriate queue wait histogram buckets
Changes the bucket values for the queue wait duration metric to be more
appropriately scaled.
2023-01-25 16:31:49 +01:00
Carol (Nichols || Goulding) f310e01b1a
test: Start of porting InfluxRpc query_tests
Make a new trait, `InfluxRpcTest`, that types can implement to define
how to run a test on a specific Storage gRPC API. `InfluxRpcTest` takes
care of iterating through the two architectures, running the setups, and
creating the custom test step.

Implementers of the trait can define aspects of the tests that differ
per run, to make the parameters of the test clearer and highlight what
different tests are testing.
2023-01-25 10:27:42 -05:00
Dom 8ee6c1ec68
Merge pull request #6701 from influxdata/dom/persist-config
feat: export persist config metrics
2023-01-25 15:25:33 +00:00
Dom dd445de275
Merge branch 'main' into dom/persist-config 2023-01-25 14:56:48 +00:00
Marco Neumann 7306ea9424
feat: divide&conquer framework (#6697)
Allows compactor2 to run a fixed-point loop (until all work is done) and
in every loop in can run mulitiple jobs.

The jobs are currently organized by "branches". This is because our
upcoming OOM handling may split a branch further if it doesn't complete.

Also note that the current config resembles the state prior to this PR.
So the FP-loop will only iterate ONCE and then runs out of L0 files. A
more advanced setup can be built using the framework though.
2023-01-25 14:45:20 +00:00
Dom Dwyer 7b69c84ceb
feat: export persist config metrics
Export the configured maximum persist parallelism, and the maximum queue
depth, so they can be used to compute % saturation in alerts /
dashboards.
2023-01-25 14:57:09 +01:00
Dom c928eddaab
Merge pull request #6698 from influxdata/dom/circuit-fuzz
test: CircuitBreaker recovery property fuzz test
2023-01-25 12:49:38 +00:00
Dom f0d7ee59c3
Merge branch 'main' into dom/circuit-fuzz 2023-01-25 12:42:43 +00:00
Dom e6876db431
Merge pull request #6700 from influxdata/dom/probe-at-most-one
perf(router): faster balancer node recovery
2023-01-25 12:42:34 +00:00
Dom b34bb46833
Merge branch 'main' into dom/circuit-fuzz 2023-01-25 12:29:46 +00:00
Dom eb67a1fa3f
Merge branch 'main' into dom/probe-at-most-one 2023-01-25 12:23:26 +00:00
Dom Dwyer 6eb1773ec0
perf(router): faster balancer node recovery
Ensure a "probe" node is always returned as the first candidate, driving
it to recovery faster.

This also includes a fix for the balancer metrics that would report
probe candidate nodes as healthy nodes.
2023-01-25 13:18:24 +01:00
Andrew Lamb 0c55a0f257
feat: Implement basic prepared statement support in IOx (#6667)
* feat: allow override of flightsql namespace

* feat: Implement DoAction endpoint

* refactor: Remove try_unpack

* fix: remove unused code / more clone
2023-01-25 12:00:43 +00:00
Andrew Lamb 6caf31acf3
chore: Move garbage collection configuration into clap_blocks (#6678)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-25 11:31:48 +00:00
dependabot[bot] f72a999fb3
chore(deps): Bump clap from 4.1.3 to 4.1.4 (#6694)
Bumps [clap](https://github.com/clap-rs/clap) from 4.1.3 to 4.1.4.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.1.3...v4.1.4)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-25 11:03:41 +00:00
Dom 40c7c8b2e2
Merge branch 'main' into dom/circuit-fuzz 2023-01-25 10:57:19 +00:00
Andrew Lamb 509c80bc55
docs: document how the garbage collector works (#6682)
* docs: document how the garbage collector works

* fix: Updates

* docs: Update docs/garbage_collector.md

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-25 10:54:43 +00:00
Dom Dwyer f5d4171be0
test: CircuitBreaker recovery property fuzz test
Adds a multi-threaded fuzz test that ensures a circuit breaker can
always transition to the healthy state, regardless of the sequence of
events prior.
2023-01-25 11:53:57 +01:00
Marco Neumann 40e6a1a437
feat: job semaphore (#6696)
* refactor: avoid too-many-arguments

* refactor: extract `fetch_partition_info`

* feat: job semaphore
2023-01-25 10:35:07 +00:00
Dom 75fc4ba17f
Merge pull request #6695 from influxdata/dependabot/cargo/ahash-0.8.3
chore(deps): Bump ahash from 0.8.2 to 0.8.3
2023-01-25 09:28:04 +00:00
dependabot[bot] cae3071776
chore(deps): Bump ahash from 0.8.2 to 0.8.3
Bumps [ahash](https://github.com/tkaitchuck/ahash) from 0.8.2 to 0.8.3.
- [Release notes](https://github.com/tkaitchuck/ahash/releases)
- [Commits](https://github.com/tkaitchuck/ahash/compare/v0.8.2...v0.8.3)

---
updated-dependencies:
- dependency-name: ahash
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-25 01:08:55 +00:00
kodiakhq[bot] 33e29e890a
Merge pull request #6688 from influxdata/dom/rpc-endpoint-metrics
feat(metrics): router upstream RPC endpoint metrics
2023-01-24 23:51:38 +00:00
Luke Bond caea42665b
Merge branch 'main' into dom/rpc-endpoint-metrics 2023-01-25 10:44:18 +11:00
Christopher M. Wolff 9a942ceff5
refactor: propagate gapfill stride to exec (#6690)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-24 20:49:29 +00:00
Dom 39dd455297
Merge pull request #6689 from influxdata/dom/ingester-rediscovery
fix(router): force rediscovery of nodes
2023-01-24 19:21:17 +00:00