Commit Graph

181 Commits (bd2a72a4b65e900b9f8d2277744facec7418f97c)

Author SHA1 Message Date
dependabot[bot] 0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Marco Neumann dcba47ab58
feat: allow the compactor to process all known partitions (#6887)
* feat: `PartitionRepo::list_ids`

* refactor: `CatalogPartitionsSource` => `CatalogToCompactPartitionsSource`

* feat: allow the compactor to process all known partitions

Closes #6648.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-02-08 09:32:21 +00:00
Stuart Carnie eb245d6774
feat: Initial SQLite catalog schema (#6851)
* feat: Initial SQLite catalog schema

* chore: Run cargo hakari tasks

* feat: impls, many TODOs

* feat: completed `todo!()`'s

* chore: add remaining tests from postgres module

* feat: add SQLite to get_catalog API

* chore: Add docs

* chore: Placate clippy

* chore: Placate clippy

* chore: PR feedback from @domodwyer

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-02-06 22:55:14 +00:00
Marco Neumann 52b43c40bc
refactor: use "endless" stream for compactor work (#6803)
Instead of looping and polling a fresh set of partitions and
constructing a stream from that, use an endless stream instead. This
helps w/ efficiency during roll-overs since we can already start to
process the next set of partitions while the last ones from the previous
round are still in-progress.

Closes #6750.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:11:39 +00:00
dependabot[bot] 6f4e287a3a
chore(deps): Bump serde_json from 1.0.91 to 1.0.92 (#6860)
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.91 to 1.0.92.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.91...v1.0.92)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-06 08:27:41 +00:00
Nga Tran e85de74a5d
feat: partition filters for TargetLevel version and a complete test (#6858)
* feat: partition filters for TargetLevel version and a complete test

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* chore: run fmt after applying review suggestions in git

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-04 19:54:41 +00:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
Nga Tran 1535366666
refactor: rename compact algo versions to reflect their actual work (#6841)
* refactor: rename compact algo versions to reflect thier actual work

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-03 15:06:38 +00:00
Marco Neumann 0a20bd404e
feat: allow selection of different compactor algo versions (#6800)
Required to lift&shift to hot-cold compaction w/ keeping the codebase
maintainable.
2023-02-01 15:33:41 +00:00
Marco Neumann 62697265c1
feat: compactor sharding (#6729)
I'm not saying we have to use this, but this is a demonstration how easy
it would be to add sharding to the compaction tier and also acts as a
"backup / insurance" if we ever need it.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-31 14:37:06 +00:00
Marco Neumann 515f0eef64
feat: simple compactor2 self-protection (#6728)
Add some rough "partition is too big" filter for now until we can deal
with them (the framework allows that but we need to set up the proper
divide-and-conquer components).

This will hopefully prevent our prod compactor from dying that often.

Note that this is also duct-tape around two issues:

- DataFusion not accounting in-flight data all the time
- Our wide fan-out query plans (see https://github.com/influxdata/idpe/issues/16768#issuecomment-1387056833 )

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-30 10:57:47 +00:00
Dom a7770f0f7a
Merge branch 'main' into dom/reduce-write-timeout 2023-01-30 09:59:37 +00:00
Christopher M. Wolff 55257b46c9
chore: validate ingester URIs on querier CLI (#6740)
* chore: add validate for ingesters on querier CLI

* chore: fix typo and tests

* chore: clippy

* chore: review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-27 21:13:52 +00:00
Dom Dwyer 353b1ad575
feat: configurable RPC write request timeout
Allows the user to configure the timeout used for a single RPC write
request, and changes the default to a more sensible value (30 -> 3
seconds).
2023-01-27 14:53:48 +01:00
Dom Dwyer 6797eab5fc
feat(router): configurable partition key
Allows the partition key to be set at runtime, though it's probably best
no one does so for now.
2023-01-27 14:26:18 +01:00
Christopher M. Wolff c78088b043
fix: update clap parser for --ingester-addresses (#6723)
* fix: update clap parser for --ingester-addresses

* fix: make querier2 specify ingester addrs same as router2

* fix: update clap parser args but do not prepend http://

* chore: cargo fmt
2023-01-27 02:54:57 +00:00
Marco Neumann 30d411dc95
feat: shadow mode (#6712)
* refactor: remove untyped durations from `compactor2`

* feat: shadow mode

Closes #6645.

* refactor: split input and output store
2023-01-26 14:20:55 +00:00
Marco Neumann ed694d3be4
feat: introduce scratchpad store for compactor (#6706)
* feat: introduce scratchpad store for compactor

Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:

- **fewer IO ops:** DataFusion's streaming IO requires slightly more
  IO requests (at least 2 per file) due to the way it is optimized to
  read as little as possible. It first reads the metadata and then
  decides which content to fetch. In the compaction case this is (esp.
  w/o delete predicates) EVERYTHING. So in contrast to the querier,
  there is no advantage of this approach. In contrary this easily adds
  100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
  run multiple compaction steps to deal with them) it is kinda pointless
  to upload an intermediate result just to download it again. The
  scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
  DataFusion jobs because we don't wanna blow up the whole process by
  having too much in-flight arrow data at the same time. However while
  we perform the actual computation, we were waiting for object store
  IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
  implement #6645.

Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.

Closes #6650.

* fix: panic on shutdown

* refactor: remove shadow scratchpad (for now)

* refactor: make scratchpad safe to use
2023-01-26 10:03:08 +00:00
Andrew Lamb 6caf31acf3
chore: Move garbage collection configuration into clap_blocks (#6678)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-25 11:31:48 +00:00
Marco Neumann 40e6a1a437
feat: job semaphore (#6696)
* refactor: avoid too-many-arguments

* refactor: extract `fetch_partition_info`

* feat: job semaphore
2023-01-25 10:35:07 +00:00
Marco Neumann 4521516147
feat: add per-partition timeout (#6686)
It seems that prod was hanging last night. This is pretty hard to debug
and in general we should protect the compactor against hanging /
malformed partitions that take forever. This is similar to the fact that
the querier also has a timeout for every query. Let's see if this shows
anything in prod (and if not it's still a desired safety net).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-24 16:53:47 +00:00
Nga Tran 840923abab
refactor: execute compaction plan (#6654)
* chore: address review comment of previous PR

* refactor: execute compact plan

* refactor: we will now compact all L0 and L1 files of a partition and split them as needed

* chore: comnents

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-20 22:34:50 +00:00
Marco Neumann 5e297b4667
refactor: lift up compactor2 CLI args, set mem limit to 8GB (#6631)
- use a single data structure for CLI args (not two)
- set mem limit default to 8GB (same as querier). We can always tune
  this later, but we should not run with "unlimited" to begin with.
2023-01-19 12:21:51 +00:00
Nga Tran 9ae03b16d6
feat: invokes catalog functions for compactor2 (#6619)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-19 10:33:57 +00:00
Marco Neumann 380a855aab
feat: basic compactor2 algo layout (#6616) 2023-01-18 18:51:59 +00:00
Marco Neumann e72173d58d
feat: very basic compactor2 skeleton (#6614)
Sets up crate and wires up the main binary. No tests yet, no algorithm
framework, just the bare minimum.

Also I decided to not offer a gRPC server in `compactor2` at the moment
and hence did not implement any handle/delegate infrastructure. We add
this later if we need it. This also means compactor2 does NOT provide a
catalog service for now.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-18 16:36:40 +00:00
Nga Tran fa0893819c
fix: have warm compaction work with compactor2 (#6571)
* refactor: same function to select partition candidates

* fix: have warm compaction work with compactor2

* fix: format

* chore: cleanup
2023-01-12 02:32:39 +00:00
Nga Tran 62c0f3dbdd
feat: have cold compaction work with Compactor2 (#6542)
* feat: cold

* chore: debug info

* feat: only compact qualified cold partition candidates

* fix: catalog test

* chore: cleanup

* chore: add new config flag for cold partition candidates

* chore: implement display for CompactionType and add tests for max num partitions

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 16:42:57 +00:00
Paul Dix 828992c9c5
feat: Ingest replica skeleton (#6529)
* feat: Update replication.proto

* Remove the PartitionId in the replicate request as a single replicate request can have the data for many partitions.
* Add namespace_id and table_id to persist complete request to make data easier to lookup in buffer.

* feat: Initial ingest_replica skeleton

A bunch of copy pasta here from ingester2, but this takes out a ton of stuff that isn't used in replicas.
Also lays the groundwork for the simpler buffer structure to keep the data and a basic cache for catalog information that will be required.

* feat: update replication.proto GetPartitionBufferResponse

* chore: PR cleanup

* chore: PR cleanup
2023-01-09 16:53:49 +00:00
kodiakhq[bot] c0f2ba09ee
Merge branch 'main' into cn/compactor2 2022-12-19 14:22:56 +00:00
dependabot[bot] 7f2aa8b10c
chore(deps): Bump serde_json from 1.0.89 to 1.0.91
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.89 to 1.0.91.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.89...v1.0.91)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-19 01:44:18 +00:00
Carol (Nichols || Goulding) d7e75d43ea
fix: Make shard ID optional for compactor queries in RPC write mode 2022-12-16 17:28:53 -05:00
Carol (Nichols || Goulding) 2406cdb24b
feat: Create a compactor2 cli 2022-12-16 17:22:06 -05:00
Dom Dwyer c830a83105
feat(ingester2): hot partition persistence
This PR uses the MutableBatch persist cost estimation added in #6425 to
selectively mark "hot" partitions for persistence.

This uses a (composable!) "post-write" observer that is invoked after
each buffer call - this allows the HotPartitionPersister in this commit
to inspect the cost of the partition after applying the write, and if it
exceeds the configurable cost threshold, enqueue it for persistence
(rotating the buffer within the partition in the process).

Unlike ingester(1), this implementation prevents overrun - the
application of the write that exceeds the cost limit, and enqueueing the
partition for persistence is atomic.
2022-12-16 19:33:34 +01:00
Luke Bond f419e2c378
feat: warm compaction (#6192)
* feat: warm compaction

chore: add missing warm compaction config

chore: tests for warm compaction

chore: modify count usage in warm compaction sql

chore: catalog test for warm compaction; sql fixes

feat: settable target level for compact w/ budget

chore: tests for warm compaction

chore: clarifying comments in warm compaction test

chore: fixed erroneous comment in catalog test

chore: improve warm compactor test by checking file exists

chore: tests for warm compaction

chore: warm compactor test tidy-ups

* chore: improve test for warm compaction

* chore: fix erroneous comment in warm compaction code
2022-12-16 15:59:45 +00:00
kodiakhq[bot] cfb7c16bb1
Merge branch 'main' into dom/optimal-persist-parallelism 2022-12-16 09:12:22 +00:00
Carol (Nichols || Goulding) 22d6b78899
docs: Fix outdated comment on querier mode switching behavior 2022-12-15 14:16:14 -05:00
Carol (Nichols || Goulding) 2a1e540ee3
fix: Rename INFLUXDB_IOX_MODE to INFLUXDB_IOX_RPC_MODE 2022-12-15 14:13:01 -05:00
Carol (Nichols || Goulding) 7d216ba1fd
feat: Error if you run the wrong command with the wrong env var set
Connects to #6402.
2022-12-15 14:06:59 -05:00
Carol (Nichols || Goulding) aec98015d7
fix: Remove the rpc_write feature flag and use INFLUXDB_IOX_MODE env var instead
And standardize on ingester2 and router2 for consistency.

Connects to #6402.
2022-12-15 14:06:59 -05:00
Dom Dwyer 933ab1f8c7
feat(ingester2): optimal persist parallelism
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.

Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.

After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.

In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.

To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
2022-12-15 18:30:51 +01:00
Dom Dwyer c7e4bf3dd1
refactor(config): default persist queue depth=250
Allow up to 250 persist jobs to be enqueued for any one worker before
pausing.

With 5 workers, this gives a maximum outstanding persist jobs of 2,500.
2022-12-14 17:19:19 +01:00
Dom Dwyer 1da9b63cce
fix(ingester2): persist deadlock
Removes the submission queue from the persist fan-out, instead the
PersistHandle now carries the shared state internally (cheaply cloned
via ref counts).

This also resolves the persist deadlock when under load.
2022-12-13 16:47:45 +01:00
Carol (Nichols || Goulding) 5141cba1db
fix: Only switch into querier RPC write path if ingester addresses specified
This enables testing of the querier using the old path with the
rpc_write feature turned on.
2022-12-08 17:40:04 -05:00
Carol (Nichols || Goulding) b85130cb7c
fix: Make --ingester-addresses optional for the querier in RPC write mode 2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding) 619a2d0856
fix: Remove conflicting arguments from the RouterRpcWriteConfig (#6355)
These were added in
https://github.com/influxdata/influxdb_iox/pull/6346.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 20:21:37 +00:00
kodiakhq[bot] 6f7cb5ccf0
Merge branch 'main' into cn/ingester2-querier 2022-12-08 14:00:49 +00:00
Carol (Nichols || Goulding) e13e668d26
refactor: Share more code in the querier in the RPC write path mode 2022-12-07 13:54:08 -05:00
Luke Bond 551bb0ef6a
feat: allow enabling/disabling ns autocreation in router (#6346)
* feat: allow enabling/disabling ns autocreation in router

* fix: missed an import for something behind router2 compile flag
2022-12-07 16:12:00 +00:00
Carol (Nichols || Goulding) 9166ace796
feat: Make a mode for the querier to use ingester2 instead, behind the rpc_write feature flag 2022-12-07 09:56:50 -05:00