Commit Graph

10181 Commits (07772e8d2254fb734e7f826298559658a4964015)

Author SHA1 Message Date
Carol (Nichols || Goulding) 07772e8d22
fix: Always return a PartitionRecord which maybe streams record batches
Connects to #6421.

Even if the ingester doesn't have data in memory for a query, we need to
send back metadata about the ingester UUID and the number of files
persisted so that the querier can decide whether it needs to refresh the
cache.
2022-12-16 17:02:41 -05:00
Carol (Nichols || Goulding) 473ce7a268
fix: Don't hardcode the transition shard id 2022-12-16 17:01:35 -05:00
Dom 97d90f5615
Merge pull request #6426 from influxdata/dom/hot-partitions
feat(ingester2): hot partition persistence
2022-12-16 19:20:27 +00:00
Dom Dwyer c830a83105
feat(ingester2): hot partition persistence
This PR uses the MutableBatch persist cost estimation added in #6425 to
selectively mark "hot" partitions for persistence.

This uses a (composable!) "post-write" observer that is invoked after
each buffer call - this allows the HotPartitionPersister in this commit
to inspect the cost of the partition after applying the write, and if it
exceeds the configurable cost threshold, enqueue it for persistence
(rotating the buffer within the partition in the process).

Unlike ingester(1), this implementation prevents overrun - the
application of the write that exceeds the cost limit, and enqueueing the
partition for persistence is atomic.
2022-12-16 19:33:34 +01:00
Paul Dix 84698b3532
feat: add size_data to mutable batch (#6425)
This method will be used in the new ingestion pipeline to approximate how much memory a butable batch will take to convert to arrow and persist. It is meant only as a very rough estimate to trigger persistence for hot partitions.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-16 17:20:16 +00:00
Luke Bond d96ba835bc
Merge pull request #6424 from influxdata/chore/redpanda-integration-tests-message
chore: update error message when running int tests w/out kafka
2022-12-16 16:23:32 +00:00
Luke Bond 656e9ee689 chore: update error message when running int tests w/out kafka 2022-12-16 16:10:32 +00:00
Luke Bond f419e2c378
feat: warm compaction (#6192)
* feat: warm compaction

chore: add missing warm compaction config

chore: tests for warm compaction

chore: modify count usage in warm compaction sql

chore: catalog test for warm compaction; sql fixes

feat: settable target level for compact w/ budget

chore: tests for warm compaction

chore: clarifying comments in warm compaction test

chore: fixed erroneous comment in catalog test

chore: improve warm compactor test by checking file exists

chore: tests for warm compaction

chore: warm compactor test tidy-ups

* chore: improve test for warm compaction

* chore: fix erroneous comment in warm compaction code
2022-12-16 15:59:45 +00:00
Marko Mikulicic 541f956f51
Merge pull request #6423 from influxdata/doc_ingester_run
fix(ingester2): Make ingester2 work with empty or existing catalogs
2022-12-16 15:52:50 +01:00
Marko Mikulicic 69d5148729 fix(ingester2): Make ingester2 work with existing catalogs and document quickstart 2022-12-16 13:16:31 +01:00
kodiakhq[bot] 3cd1f1ce4b
Merge pull request #6416 from influxdata/dom/optimal-persist-parallelism
feat(ingester2): optimal persist parallelism
2022-12-16 09:19:08 +00:00
kodiakhq[bot] cfb7c16bb1
Merge branch 'main' into dom/optimal-persist-parallelism 2022-12-16 09:12:22 +00:00
Dom 7c8917aff6
Merge pull request #6419 from influxdata/dependabot/cargo/cc-1.0.78
chore(deps): Bump cc from 1.0.77 to 1.0.78
2022-12-16 09:11:06 +00:00
dependabot[bot] 6a841bdf5a
chore(deps): Bump cc from 1.0.77 to 1.0.78
Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.0.77 to 1.0.78.
- [Release notes](https://github.com/rust-lang/cc-rs/releases)
- [Commits](https://github.com/rust-lang/cc-rs/compare/1.0.77...1.0.78)

---
updated-dependencies:
- dependency-name: cc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-16 01:08:24 +00:00
kodiakhq[bot] a75e9aadf3
Merge pull request #6406 from influxdata/cn/bye-feature-flag
fix: Remove the rpc_write feature flag and use INFLUXDB_IOX_MODE env var instead
2022-12-15 19:23:22 +00:00
Carol (Nichols || Goulding) 22d6b78899
docs: Fix outdated comment on querier mode switching behavior 2022-12-15 14:16:14 -05:00
Carol (Nichols || Goulding) 2a1e540ee3
fix: Rename INFLUXDB_IOX_MODE to INFLUXDB_IOX_RPC_MODE 2022-12-15 14:13:01 -05:00
Carol (Nichols || Goulding) 7d216ba1fd
feat: Error if you run the wrong command with the wrong env var set
Connects to #6402.
2022-12-15 14:06:59 -05:00
Carol (Nichols || Goulding) aec98015d7
fix: Remove the rpc_write feature flag and use INFLUXDB_IOX_MODE env var instead
And standardize on ingester2 and router2 for consistency.

Connects to #6402.
2022-12-15 14:06:59 -05:00
Dom Dwyer 933ab1f8c7
feat(ingester2): optimal persist parallelism
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.

Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.

After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.

In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.

To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
2022-12-15 18:30:51 +01:00
Dom Dwyer e24d21255b
refactor: inject persist started timestamp
Instead of recording the "enqueued_at" when initialising the
PersistRequest, inject the value in.

This lets us re-order the request construction while retaining accurate
timing.
2022-12-15 18:28:08 +01:00
Dom a47a566cac
Merge pull request #6412 from influxdata/dom/deferred-load-peek
feat(deferred_load): peek() immediate values
2022-12-15 17:26:57 +00:00
Dom ede2627dcf
Merge branch 'main' into dom/deferred-load-peek 2022-12-15 17:16:02 +00:00
Dom e0886c3cdf
Merge pull request #6413 from influxdata/dom/ooo-parittion-persist
feat(ingester2): out-of-order partition persist
2022-12-15 17:15:56 +00:00
Dom 261eeacf3c
Merge branch 'main' into dom/ooo-parittion-persist 2022-12-15 17:07:53 +00:00
Dom 9928a51142
Merge pull request #6403 from influxdata/dom/persist-back-pressure
feat(ingester2): persist back pressure
2022-12-15 17:04:53 +00:00
Dom f2751aef77
Merge branch 'main' into dom/persist-back-pressure 2022-12-15 16:56:38 +00:00
Dom d02aae7ba0
Merge branch 'main' into dom/ooo-parittion-persist 2022-12-15 16:17:53 +00:00
Andrew Lamb 78aba66ca2
refactor: Improve Flight API server side code and comments (#6395)
* refactor: Improve Flight API server side code and comments

* refactor: revert &str signature in FlightService::run_query
2022-12-15 14:10:58 +00:00
Dom Dwyer 7d7c8db334
feat(ingester2): out-of-order partition persist
Previously data within a partition had to be persisted in the order in
which the data was received. This was necessary for the correctness of
the query API, as it utilised the lower-bound sequence number to
determine what data was available in the object store.

With the changes to the parquet discovery protocol / query API made in
https://github.com/influxdata/influxdb_iox/pull/6365 this restriction
can be lifted, allowing out-of-order persistence within a partition for
increased parallelism / performance.

This commit changes the PartitionData to accept out-of-order persist
completion notifications, removing the ordering invariant from ingester2
(note that the persist ops currently remain ordered however).
2022-12-15 14:38:13 +01:00
Dom Dwyer b15aebbddc
feat(deferred_load): peek() immediate values
Adds a peek() method to the DeferredLoad construct, allowing a caller to
immediately read the resolved value, or "None" if the value is
unresolved or concurrently resolving.

This allows a caller to optimistically read the value without having to
block and wait for it to become available.
2022-12-15 14:33:44 +01:00
Marco Neumann a5d693eba2
feat: lower Influx regex expressions to DF regex expressions (#6394)
* feat: lower Influx regex experessions to DF regex expressions

For #6388.

* refactor: address review comments
2022-12-15 09:33:28 +00:00
dependabot[bot] 6324707110
chore(deps): Bump backtrace from 0.3.66 to 0.3.67 (#6410)
Bumps [backtrace](https://github.com/rust-lang/backtrace-rs) from 0.3.66 to 0.3.67.
- [Release notes](https://github.com/rust-lang/backtrace-rs/releases)
- [Commits](https://github.com/rust-lang/backtrace-rs/compare/0.3.66...0.3.67)

---
updated-dependencies:
- dependency-name: backtrace
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-15 09:23:12 +00:00
dependabot[bot] f82f516c0d
chore(deps): Bump toml from 0.5.9 to 0.5.10 (#6409)
Bumps [toml](https://github.com/toml-rs/toml) from 0.5.9 to 0.5.10.
- [Release notes](https://github.com/toml-rs/toml/releases)
- [Commits](https://github.com/toml-rs/toml/commits/toml-v0.5.10)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-15 08:48:53 +00:00
Marco Neumann ffe8b98f47
refactor: clean up querier code base (#6404)
* refactor: `s/QuerierChunk/QuerierParquetChunk/g`

* refactor: isolate parquet chunk creation code

* refactor: fuse `chunk` and `chunk_parts`

* refactor: pass catalog cache instead of chunk adapter to state reconciler

* refactor: move parquet chunks creation into its own method

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-15 07:01:11 +00:00
Andrew Lamb be45889be1
chore: Upgrade datafusion (#6407)
* chore: Update datafusion

* chore: Update for API change
2022-12-15 06:51:35 +00:00
Andrew Lamb 8e1bf9cdf7
feat: Increase default all-in-one compactor memory to 300MB and expose it as a setting (#6405) 2022-12-14 20:35:48 +00:00
Dom Dwyer c7e4bf3dd1
refactor(config): default persist queue depth=250
Allow up to 250 persist jobs to be enqueued for any one worker before
pausing.

With 5 workers, this gives a maximum outstanding persist jobs of 2,500.
2022-12-14 17:19:19 +01:00
Dom Dwyer e76b107332
feat(ingester2): persist back-pressure
This commit causes an ingester2 instance to stop accepting new writes
when at least one persist queue is full. Writes continue to be rejected
until the persist workers have processed enough outstanding persist
tasks to drain the queues to half of their capacity, at which point
writes are accepted again.

When a write is rejected, the ingester returns a "resource exhausted"
RPC code to the caller.

Checking if the system is in a healthy state for writes is extremely
cheap, as it is on the hot path for all writes.
2022-12-14 17:17:17 +01:00
Dom Dwyer 6c555600e0
test: show caller in timeout panics
Changes the stack trace of the timeout panics to show the line that
timed out, rather than the timeout implementation itself.
2022-12-14 17:13:48 +01:00
Paul Dix d9c72bb93f
feat: optimize wal with batching (#6399)
* feat: optimize wal with batching

Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.

Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.

* chore: cleanup wal code for PR feedback
2022-12-14 16:07:20 +00:00
kodiakhq[bot] 940e76dab2
Merge pull request #6365 from influxdata/cn/ingester-persisted-file-count
feat: Keep track of & report # of Parquet files persisted, invalidate querier cache
2022-12-14 15:56:42 +00:00
kodiakhq[bot] d6afc9eee1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 15:48:59 +00:00
Marco Neumann 4e36c590af
refactor: speed up partition sort key syncing (#6400)
* refactor: speed up partition sort key syncing

Prior to syncing, all chunks have a "locally correct" partiton sort key,
i.e. one that at least covers all chunk columns (this is ensured during
chunk creation, both for parquet chunks as well as ingester chunks).
However due to the timing, some chunks may have a newer (= longer)
partition sort key. All we need to do to fix this is to pick the longest
partition sort key, there is no need to go through the whole cache
system again.

For #6358.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-14 15:48:08 +00:00
kodiakhq[bot] 66c610f7b1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 14:58:31 +00:00
Carol (Nichols || Goulding) f29bed86c0
fix: Improve log messages and docs as suggested in code review
Co-authored-by: Dom <dom@itsallbroken.com>
2022-12-14 09:52:09 -05:00
Paul Dix 82e57ac76a
feat: make data generator handle failed requests (#6397)
Updates the data generator to handle failed requests. Adds some println output to show progress along the way.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 21:35:23 +00:00
Andrew Lamb 8729977851
chore: Upgrade datafusion / arrow to 29.0.0 to get flightsql client (#6396)
* chore: Update datafusion pin

* chore: Update for API change

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-13 20:16:09 +00:00
Marco Neumann 65687bf0fa
test: regex baseline test (#6389)
For #6388.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:42:31 +00:00
Andrew Lamb 47cd6821e1
feat: Document IOx Flight API and add convenience methods (#6392)
* feat: Document IOx Flight API and add convenience methods

* fix: InfluxQL handling

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:32:37 +00:00