Commit Graph

109 Commits (d1fb98fdd76b966c9a6f7fcae26fc77125df7a9e)

Author SHA1 Message Date
Dom d666bf6d22
Merge branch 'main' into dom/track-seq-num 2023-01-09 13:47:40 +00:00
Dom Dwyer aab4f6c651
refactor: remove unused QueryExec impl
This is completely unused and left over from the initial skeleton.
2023-01-09 14:28:16 +01:00
Dom Dwyer 1f509f47b1
refactor: log number of writes in persist batch
Include the number of DML operations applied to the persisted buffer
in the "persisted partition" message.

Partly because I'm intrigued / it's useful information, and partly to
ensure LLVM doesn't get snazzy and dead-code the sequence number
tracking because it was never read.
2023-01-09 13:31:42 +01:00
Dom Dwyer ca2b8afbb1
refactor(ingester2): track buffer sequence numbers
Changes the ingester2 buffer FSM to track the sequence numbers that have
been applied to it.

This is a pre-requisite for replication & correct WAL segment dropping.
2023-01-09 13:27:18 +01:00
Dom Dwyer 0916529cfb
docs: remove outdated monotonicity comment
Previously the ingester(1) required ordered writes to be applied, this
requirement has been relaxed, and the asserts (previously) removed in
ingester2.
2023-01-09 13:24:51 +01:00
dependabot[bot] e31c84a794
chore(deps): Bump async-trait from 0.1.60 to 0.1.61 (#6533)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.60 to 0.1.61.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-09 07:44:35 +00:00
Nga Tran b856edf826
feat: function to get parttion candidates from partition table (#6519)
* feat: function to get parttion candidates from partition table

* chore: cleanup

* fix: make new_file_at the same value as created_at

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-06 16:20:45 +00:00
Raphael Taylor-Davies e1036a0c63
refactor: cleanup schema boxing (#6511)
* refactor: cleanup Schema boxing

* chore: clippy
2023-01-06 10:57:39 +00:00
Andrew Lamb 6843eee1d2
feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations (#6460)
* feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations

Refactor existing flight server impl

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fixup code review comments

* fix: update for more details

* fix: Update names / types

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-04 13:36:16 +00:00
dependabot[bot] 0aacef3c59
chore(deps): Bump once_cell from 1.16.0 to 1.17.0 (#6473)
* chore(deps): Bump once_cell from 1.16.0 to 1.17.0

Bumps [once_cell](https://github.com/matklad/once_cell) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Change once_cell version specifier to major.minor for less churn

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
2023-01-02 17:07:15 +00:00
Dom Dwyer 2636eefe03
fix: only replay non-empty segments
If the WAL file was dropped because it is empty, do not retry dropping
it.
2022-12-22 17:18:19 +01:00
Dom Dwyer 66f1628238
fix: drop WAL segments after replay
Changes the WAL replay logic to:

    * Replay a segment file
    * Persist all replayed data
    * Drop segment file
    * ...repeat...

This ensures old WAL segments are removed once their contents have been
made durable, fixing #6461.
2022-12-22 16:56:47 +01:00
Dom Dwyer 456368f71d
refactor(persist): no PersistQueue Clone bound
Removes the Clone bound from PersistQueue, also removing the Clone impl
from the PersistHandle.

Instead of wrapping all internal PersistHandle state in Arcs, this
commit changes the system to use a single Arc wrapping the PersistHandle
which is shared.
2022-12-22 15:04:52 +01:00
Dom Dwyer 26eea6078d
refactor(ingester2): decouple persist subsystem
Multiple components of the ingester depend on being able to enqueue a
partition's data for persistence. This commit decouples those components
from the concrete PersistHandle by introducing a PersistQueue trait that
defines the desired behaviour, on which the components depend.

This is a much needed clean-up of something I knowingly punted on for
the MVP, and I feel much better about the situation now!
2022-12-22 15:04:51 +01:00
Dom Dwyer dee9743e52
refactor(persist): decoupled PartitionIter
The persist_buffer() fn iterates over all the partitions in a BufferTree
and persists them - however it only depends on one behaviour; getting an
iterator of partitions.

This commit introduces the PartitionIter, an abstraction over anything
that can produce an iterator of PartitionData, decoupling the
persist_buffer() helper (and the callers!) from the concrete BufferTree
type.
2022-12-22 14:53:30 +01:00
Dom Dwyer e54896e5f8
refactor: extract BufferTree persist helper
Extract an existing function for re-use (from the WAL rotation task)
that marks & enqueues all non-empty partitions in a BufferTree for
persistence.
2022-12-22 11:58:41 +01:00
Dom 6df3c1d4ca
Merge branch 'main' into dom/persist-saturation-metric 2022-12-21 17:07:42 +00:00
Dom Dwyer 23dc2c4e06
refactor: consistent metric naming
Removes _ns (and incorrect _ms) suffix.
2022-12-21 18:04:20 +01:00
Dom Dwyer 23b781f274
fix(persist): invalidate cached sort key
The sort-key conflict path invalidated the cached sort key in the
PartitionData, but not the cached sort key in the persist's Context. Now
both are invalidated.
2022-12-21 17:45:48 +01:00
Dom Dwyer 679c6a7896
feat(ingester2): persist saturation metric
Expose a metric ("ingester_persist_saturated_duration_ns") that records
the cumulative duration of time the persist system has spent in the
"saturated" state.
2022-12-21 17:01:22 +01:00
Dom Dwyer 15cff11b08
refactor(persist): explicit worker module
Separate out persist worker types & routines into a separate worker
module rather than commingling them with the persist handle, and rename
the unimaginative "inner" to reflect the actual usage.
2022-12-21 14:28:30 +01:00
Dom Dwyer 7b133f85a1
docs: rust doclink failure
Rustdoc is so picky about indented text.
2022-12-20 17:13:24 +01:00
Dom Dwyer b3363639f5
chore: nudge CI 2022-12-20 17:05:03 +01:00
Dom Dwyer 5f4acf186d
docs: fix bad doc link
Rust hates indented URLs.
2022-12-20 15:25:34 +01:00
Dom Dwyer e083f3276c
feat(persist): accept concurrent matching updates
As an optimisation, allow a persist task to progress if it observes a
concurrent catalog sort key update that exactly matches the sort key it
was committing.
2022-12-20 15:15:39 +01:00
Dom Dwyer f64ffbe035
fix(ingester2): handle concurrent sort key updates
Allow an ingester2 instance to tolerate concurrent partition sort key
updates in the catalog.

A persist job is optimistically executed with the locally cached sort
key. If an ingester2 instance observes a concurrent update, it aborts
both the sort key update, and the overall persist operation (before
making the parquet file visible) and retries the operation with the
newly observed sort key. Concurrent sort key updates are theorised to be
relatively rare overall.

Any orphaned parquet files uploaded as part of a persist job that aborts
due to a concurrent sort key update are eventually removed by the
(external) object store GC task.

See https://github.com/influxdata/influxdb_iox/issues/6439
2022-12-20 15:15:39 +01:00
Dom Dwyer adc6fcfb04
feat(catalog): linearise sort key updates
Updating the sort key is not commutative and MUST be serialised. The
correctness of the current catalog interface relies on the caller
serialising updates globally, something it cannot reasonably assert in a
distributed system.

This change of the catalog interface pushes this responsibility to the
catalog itself where it can be effectively enforced, and allows a caller
to detect parallel updates to the sort key.
2022-12-20 12:31:00 +01:00
Dom dbbe43f241
Merge branch 'main' into dom/query-types-refactor 2022-12-19 15:03:16 +00:00
Dom Dwyer 84e29791e5
docs: fix incomplete comment
Finishes the incomplete sentence that
2022-12-19 12:46:21 +01:00
dependabot[bot] 299f0e99f9
chore(deps): Bump thiserror from 1.0.37 to 1.0.38
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.37 to 1.0.38.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.37...1.0.38)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-19 10:33:32 +00:00
Dom Dwyer 371857399c
refactor: avoid double flattening stream
Changes the into_record_batches() method to avoid creating an extra
stream out of the Option that must be flattened (iterating over the
option vs. filtering out all None first).
2022-12-19 11:31:41 +01:00
dependabot[bot] 8478d41bcb
chore(deps): Bump paste from 1.0.10 to 1.0.11 (#6430)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.10 to 1.0.11.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.10...1.0.11)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-19 10:31:05 +00:00
Dom Dwyer b87d572e42
refactor: single PartitionResponse constructor
Removes the PartitionResponse::new_no_batches() constructor, instead
using an Option-wrapped data. Before that would have been confusing
(many Option in the constructor signature) but now there's only one!
2022-12-19 11:25:20 +01:00
Dom Dwyer c1db76bf9e
refactor: remove max seqnum in PartitionResponse
Removes the redundant max_persisted_sequence_number in
PartitionResponse, which was functionally replaced with
completed_persistence_count for the Querier's parquet file discovery
instead.
2022-12-19 11:21:53 +01:00
Dom Dwyer 13ed3f9acb
fix: show lack of partition data in query output
Show that a PartitionResponse does not contain data in the Debug output.
2022-12-19 11:18:21 +01:00
dependabot[bot] c72734473c
chore(deps): Bump async-trait from 0.1.59 to 0.1.60 (#6433)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.59 to 0.1.60.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.59...0.1.60)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-19 10:09:23 +00:00
Carol (Nichols || Goulding) 07772e8d22
fix: Always return a PartitionRecord which maybe streams record batches
Connects to #6421.

Even if the ingester doesn't have data in memory for a query, we need to
send back metadata about the ingester UUID and the number of files
persisted so that the querier can decide whether it needs to refresh the
cache.
2022-12-16 17:02:41 -05:00
Carol (Nichols || Goulding) 473ce7a268
fix: Don't hardcode the transition shard id 2022-12-16 17:01:35 -05:00
Dom Dwyer c830a83105
feat(ingester2): hot partition persistence
This PR uses the MutableBatch persist cost estimation added in #6425 to
selectively mark "hot" partitions for persistence.

This uses a (composable!) "post-write" observer that is invoked after
each buffer call - this allows the HotPartitionPersister in this commit
to inspect the cost of the partition after applying the write, and if it
exceeds the configurable cost threshold, enqueue it for persistence
(rotating the buffer within the partition in the process).

Unlike ingester(1), this implementation prevents overrun - the
application of the write that exceeds the cost limit, and enqueueing the
partition for persistence is atomic.
2022-12-16 19:33:34 +01:00
Marko Mikulicic 69d5148729 fix(ingester2): Make ingester2 work with existing catalogs and document quickstart 2022-12-16 13:16:31 +01:00
Dom Dwyer 933ab1f8c7
feat(ingester2): optimal persist parallelism
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.

Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.

After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.

In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.

To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
2022-12-15 18:30:51 +01:00
Dom Dwyer e24d21255b
refactor: inject persist started timestamp
Instead of recording the "enqueued_at" when initialising the
PersistRequest, inject the value in.

This lets us re-order the request construction while retaining accurate
timing.
2022-12-15 18:28:08 +01:00
Dom ede2627dcf
Merge branch 'main' into dom/deferred-load-peek 2022-12-15 17:16:02 +00:00
Dom 261eeacf3c
Merge branch 'main' into dom/ooo-parittion-persist 2022-12-15 17:07:53 +00:00
Dom Dwyer 7d7c8db334
feat(ingester2): out-of-order partition persist
Previously data within a partition had to be persisted in the order in
which the data was received. This was necessary for the correctness of
the query API, as it utilised the lower-bound sequence number to
determine what data was available in the object store.

With the changes to the parquet discovery protocol / query API made in
https://github.com/influxdata/influxdb_iox/pull/6365 this restriction
can be lifted, allowing out-of-order persistence within a partition for
increased parallelism / performance.

This commit changes the PartitionData to accept out-of-order persist
completion notifications, removing the ordering invariant from ingester2
(note that the persist ops currently remain ordered however).
2022-12-15 14:38:13 +01:00
Dom Dwyer b15aebbddc
feat(deferred_load): peek() immediate values
Adds a peek() method to the DeferredLoad construct, allowing a caller to
immediately read the resolved value, or "None" if the value is
unresolved or concurrently resolving.

This allows a caller to optimistically read the value without having to
block and wait for it to become available.
2022-12-15 14:33:44 +01:00
Dom Dwyer e76b107332
feat(ingester2): persist back-pressure
This commit causes an ingester2 instance to stop accepting new writes
when at least one persist queue is full. Writes continue to be rejected
until the persist workers have processed enough outstanding persist
tasks to drain the queues to half of their capacity, at which point
writes are accepted again.

When a write is rejected, the ingester returns a "resource exhausted"
RPC code to the caller.

Checking if the system is in a healthy state for writes is extremely
cheap, as it is on the hot path for all writes.
2022-12-14 17:17:17 +01:00
Paul Dix d9c72bb93f
feat: optimize wal with batching (#6399)
* feat: optimize wal with batching

Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.

Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.

* chore: cleanup wal code for PR feedback
2022-12-14 16:07:20 +00:00
kodiakhq[bot] 66c610f7b1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 14:58:31 +00:00
Carol (Nichols || Goulding) f29bed86c0
fix: Improve log messages and docs as suggested in code review
Co-authored-by: Dom <dom@itsallbroken.com>
2022-12-14 09:52:09 -05:00