Commit Graph

10315 Commits (c2f479d3709a6a54642fda31779918c312eb6d8a)

Author SHA1 Message Date
Carol (Nichols || Goulding) 2a1e540ee3
fix: Rename INFLUXDB_IOX_MODE to INFLUXDB_IOX_RPC_MODE 2022-12-15 14:13:01 -05:00
Carol (Nichols || Goulding) 7d216ba1fd
feat: Error if you run the wrong command with the wrong env var set
Connects to #6402.
2022-12-15 14:06:59 -05:00
Carol (Nichols || Goulding) aec98015d7
fix: Remove the rpc_write feature flag and use INFLUXDB_IOX_MODE env var instead
And standardize on ingester2 and router2 for consistency.

Connects to #6402.
2022-12-15 14:06:59 -05:00
Dom Dwyer 933ab1f8c7
feat(ingester2): optimal persist parallelism
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.

Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.

After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.

In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.

To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
2022-12-15 18:30:51 +01:00
Dom Dwyer e24d21255b
refactor: inject persist started timestamp
Instead of recording the "enqueued_at" when initialising the
PersistRequest, inject the value in.

This lets us re-order the request construction while retaining accurate
timing.
2022-12-15 18:28:08 +01:00
Dom a47a566cac
Merge pull request #6412 from influxdata/dom/deferred-load-peek
feat(deferred_load): peek() immediate values
2022-12-15 17:26:57 +00:00
Dom ede2627dcf
Merge branch 'main' into dom/deferred-load-peek 2022-12-15 17:16:02 +00:00
Dom e0886c3cdf
Merge pull request #6413 from influxdata/dom/ooo-parittion-persist
feat(ingester2): out-of-order partition persist
2022-12-15 17:15:56 +00:00
Dom 261eeacf3c
Merge branch 'main' into dom/ooo-parittion-persist 2022-12-15 17:07:53 +00:00
Dom 9928a51142
Merge pull request #6403 from influxdata/dom/persist-back-pressure
feat(ingester2): persist back pressure
2022-12-15 17:04:53 +00:00
Dom f2751aef77
Merge branch 'main' into dom/persist-back-pressure 2022-12-15 16:56:38 +00:00
Dom d02aae7ba0
Merge branch 'main' into dom/ooo-parittion-persist 2022-12-15 16:17:53 +00:00
Andrew Lamb 78aba66ca2
refactor: Improve Flight API server side code and comments (#6395)
* refactor: Improve Flight API server side code and comments

* refactor: revert &str signature in FlightService::run_query
2022-12-15 14:10:58 +00:00
Dom Dwyer 7d7c8db334
feat(ingester2): out-of-order partition persist
Previously data within a partition had to be persisted in the order in
which the data was received. This was necessary for the correctness of
the query API, as it utilised the lower-bound sequence number to
determine what data was available in the object store.

With the changes to the parquet discovery protocol / query API made in
https://github.com/influxdata/influxdb_iox/pull/6365 this restriction
can be lifted, allowing out-of-order persistence within a partition for
increased parallelism / performance.

This commit changes the PartitionData to accept out-of-order persist
completion notifications, removing the ordering invariant from ingester2
(note that the persist ops currently remain ordered however).
2022-12-15 14:38:13 +01:00
Dom Dwyer b15aebbddc
feat(deferred_load): peek() immediate values
Adds a peek() method to the DeferredLoad construct, allowing a caller to
immediately read the resolved value, or "None" if the value is
unresolved or concurrently resolving.

This allows a caller to optimistically read the value without having to
block and wait for it to become available.
2022-12-15 14:33:44 +01:00
Marco Neumann a5d693eba2
feat: lower Influx regex expressions to DF regex expressions (#6394)
* feat: lower Influx regex experessions to DF regex expressions

For #6388.

* refactor: address review comments
2022-12-15 09:33:28 +00:00
dependabot[bot] 6324707110
chore(deps): Bump backtrace from 0.3.66 to 0.3.67 (#6410)
Bumps [backtrace](https://github.com/rust-lang/backtrace-rs) from 0.3.66 to 0.3.67.
- [Release notes](https://github.com/rust-lang/backtrace-rs/releases)
- [Commits](https://github.com/rust-lang/backtrace-rs/compare/0.3.66...0.3.67)

---
updated-dependencies:
- dependency-name: backtrace
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-15 09:23:12 +00:00
dependabot[bot] f82f516c0d
chore(deps): Bump toml from 0.5.9 to 0.5.10 (#6409)
Bumps [toml](https://github.com/toml-rs/toml) from 0.5.9 to 0.5.10.
- [Release notes](https://github.com/toml-rs/toml/releases)
- [Commits](https://github.com/toml-rs/toml/commits/toml-v0.5.10)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-15 08:48:53 +00:00
Marco Neumann ffe8b98f47
refactor: clean up querier code base (#6404)
* refactor: `s/QuerierChunk/QuerierParquetChunk/g`

* refactor: isolate parquet chunk creation code

* refactor: fuse `chunk` and `chunk_parts`

* refactor: pass catalog cache instead of chunk adapter to state reconciler

* refactor: move parquet chunks creation into its own method

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-15 07:01:11 +00:00
Andrew Lamb be45889be1
chore: Upgrade datafusion (#6407)
* chore: Update datafusion

* chore: Update for API change
2022-12-15 06:51:35 +00:00
Andrew Lamb 8e1bf9cdf7
feat: Increase default all-in-one compactor memory to 300MB and expose it as a setting (#6405) 2022-12-14 20:35:48 +00:00
Dom Dwyer c7e4bf3dd1
refactor(config): default persist queue depth=250
Allow up to 250 persist jobs to be enqueued for any one worker before
pausing.

With 5 workers, this gives a maximum outstanding persist jobs of 2,500.
2022-12-14 17:19:19 +01:00
Dom Dwyer e76b107332
feat(ingester2): persist back-pressure
This commit causes an ingester2 instance to stop accepting new writes
when at least one persist queue is full. Writes continue to be rejected
until the persist workers have processed enough outstanding persist
tasks to drain the queues to half of their capacity, at which point
writes are accepted again.

When a write is rejected, the ingester returns a "resource exhausted"
RPC code to the caller.

Checking if the system is in a healthy state for writes is extremely
cheap, as it is on the hot path for all writes.
2022-12-14 17:17:17 +01:00
Dom Dwyer 6c555600e0
test: show caller in timeout panics
Changes the stack trace of the timeout panics to show the line that
timed out, rather than the timeout implementation itself.
2022-12-14 17:13:48 +01:00
Paul Dix d9c72bb93f
feat: optimize wal with batching (#6399)
* feat: optimize wal with batching

Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.

Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.

* chore: cleanup wal code for PR feedback
2022-12-14 16:07:20 +00:00
kodiakhq[bot] 940e76dab2
Merge pull request #6365 from influxdata/cn/ingester-persisted-file-count
feat: Keep track of & report # of Parquet files persisted, invalidate querier cache
2022-12-14 15:56:42 +00:00
kodiakhq[bot] d6afc9eee1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 15:48:59 +00:00
Marco Neumann 4e36c590af
refactor: speed up partition sort key syncing (#6400)
* refactor: speed up partition sort key syncing

Prior to syncing, all chunks have a "locally correct" partiton sort key,
i.e. one that at least covers all chunk columns (this is ensured during
chunk creation, both for parquet chunks as well as ingester chunks).
However due to the timing, some chunks may have a newer (= longer)
partition sort key. All we need to do to fix this is to pick the longest
partition sort key, there is no need to go through the whole cache
system again.

For #6358.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-14 15:48:08 +00:00
kodiakhq[bot] 66c610f7b1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 14:58:31 +00:00
Carol (Nichols || Goulding) f29bed86c0
fix: Improve log messages and docs as suggested in code review
Co-authored-by: Dom <dom@itsallbroken.com>
2022-12-14 09:52:09 -05:00
Paul Dix 82e57ac76a
feat: make data generator handle failed requests (#6397)
Updates the data generator to handle failed requests. Adds some println output to show progress along the way.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 21:35:23 +00:00
Andrew Lamb 8729977851
chore: Upgrade datafusion / arrow to 29.0.0 to get flightsql client (#6396)
* chore: Update datafusion pin

* chore: Update for API change

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-13 20:16:09 +00:00
Marco Neumann 65687bf0fa
test: regex baseline test (#6389)
For #6388.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:42:31 +00:00
Andrew Lamb 47cd6821e1
feat: Document IOx Flight API and add convenience methods (#6392)
* feat: Document IOx Flight API and add convenience methods

* fix: InfluxQL handling

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:32:37 +00:00
Marco Neumann c51548f28b
refactor: improve concurrency during parquet chunk creation (#6376)
* refactor: de-correletate parquet file processing

* refactor: increase concurrent chunk creation jobs to 100 (from 10)

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: use deterministic RNG

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 16:16:09 +00:00
kodiakhq[bot] 34edbae6d7
Merge pull request #6393 from influxdata/dom/remove-queue
fix(ingester2): persist deadlock
2022-12-13 16:08:01 +00:00
Dom Dwyer 8f0da90d76
docs: remove ref to PersistActor
Fix bad reflink to something that no longer exists.
2022-12-13 16:59:15 +01:00
Dom Dwyer 309386b828
chore: silence spurious lint
This is by design! Clippy just doesn't see the plan.
2022-12-13 16:59:14 +01:00
Dom Dwyer 1da9b63cce
fix(ingester2): persist deadlock
Removes the submission queue from the persist fan-out, instead the
PersistHandle now carries the shared state internally (cheaply cloned
via ref counts).

This also resolves the persist deadlock when under load.
2022-12-13 16:47:45 +01:00
kodiakhq[bot] e81d078f3c
Merge pull request #6377 from influxdata/dom/wal-bench
test(ingester2): WAL replay benchmark
2022-12-13 15:27:53 +00:00
kodiakhq[bot] 9e8ae1485f
Merge branch 'main' into dom/wal-bench 2022-12-13 15:19:32 +00:00
kodiakhq[bot] d9c9865297
Merge pull request #6386 from influxdata/dom/persist-logging
feat(ingester2): log persist active & queue timings
2022-12-13 15:19:05 +00:00
kodiakhq[bot] cff3d3528d
Merge branch 'main' into dom/persist-logging 2022-12-13 15:11:10 +00:00
kodiakhq[bot] e5b813c84f
Merge pull request #6387 from influxdata/dom/editor-config
chore: editor config spacing for shell scripts
2022-12-13 10:41:15 +00:00
Dom Dwyer 65d45fbe91
chore: editor config spacing for shell scripts
Set .bash and .sh script indent size to 4.
2022-12-13 11:12:11 +01:00
Dom Dwyer 5fa4e49098
feat(ingester2): persist active & queue timings
Adds more debug logging to the persist code paths, as well as capturing
& logging (at INFO) timing information tracking the time a persist task
spends in the queue, the active time spent actually persisting the data,
and the total duration of time since the request was created (sum of
both durations).
2022-12-13 11:06:09 +01:00
dependabot[bot] e108a8b6c9
chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 06:03:05 +00:00
Stuart Carnie f56b834438
chore: Implemented ZeroOrMore item container (#6373)
* chore: Implemented ZeroOrMore item container

Closes #6372

* chore: Use canonical names based on feedback
2022-12-12 22:01:30 +00:00
Carol (Nichols || Goulding) fdbf9e112e
fix: Actually switch into rpc_write mode in querier
Only when the feature flag is set *and* --ingester-addresses is set. I
had documented that intention, but didn't actually implement it
correctly.
2022-12-12 16:37:11 -05:00
Carol (Nichols || Goulding) 44c3486db0
feat: Expire the querier's cache using info from ingester2
Fixes #6335.

For each table, keep track of the ingester UUIDs and associated
persisted Parquet file counts that we've seen from previous requests to
ingesters. When doing a query, determine if we should expire the Parquet
file catalog cache by looking at the new information from the ingesters.

If we see a new ingester UUID or if the number of persisted files for a
known ingester UUID is different than what we've stored, then we should
expire this table's Parquet file cache.

Either way, incorporate the new information into the saved values for
comparing with the next request.
2022-12-12 15:53:39 -05:00