influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	b3363639f5	chore: nudge CI	2022-12-20 17:05:03 +01:00
Dom Dwyer	5f4acf186d	docs: fix bad doc link Rust hates indented URLs.	2022-12-20 15:25:34 +01:00
Dom Dwyer	e083f3276c	feat(persist): accept concurrent matching updates As an optimisation, allow a persist task to progress if it observes a concurrent catalog sort key update that exactly matches the sort key it was committing.	2022-12-20 15:15:39 +01:00
Dom Dwyer	f64ffbe035	fix(ingester2): handle concurrent sort key updates Allow an ingester2 instance to tolerate concurrent partition sort key updates in the catalog. A persist job is optimistically executed with the locally cached sort key. If an ingester2 instance observes a concurrent update, it aborts both the sort key update, and the overall persist operation (before making the parquet file visible) and retries the operation with the newly observed sort key. Concurrent sort key updates are theorised to be relatively rare overall. Any orphaned parquet files uploaded as part of a persist job that aborts due to a concurrent sort key update are eventually removed by the (external) object store GC task. See https://github.com/influxdata/influxdb_iox/issues/6439	2022-12-20 15:15:39 +01:00
Dom Dwyer	adc6fcfb04	feat(catalog): linearise sort key updates Updating the sort key is not commutative and MUST be serialised. The correctness of the current catalog interface relies on the caller serialising updates globally, something it cannot reasonably assert in a distributed system. This change of the catalog interface pushes this responsibility to the catalog itself where it can be effectively enforced, and allows a caller to detect parallel updates to the sort key.	2022-12-20 12:31:00 +01:00
Dom	dbbe43f241	Merge branch 'main' into dom/query-types-refactor	2022-12-19 15:03:16 +00:00
Dom Dwyer	84e29791e5	docs: fix incomplete comment Finishes the incomplete sentence that	2022-12-19 12:46:21 +01:00
dependabot[bot]	299f0e99f9	chore(deps): Bump thiserror from 1.0.37 to 1.0.38 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.37 to 1.0.38. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.37...1.0.38) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-12-19 10:33:32 +00:00
Dom Dwyer	371857399c	refactor: avoid double flattening stream Changes the into_record_batches() method to avoid creating an extra stream out of the Option that must be flattened (iterating over the option vs. filtering out all None first).	2022-12-19 11:31:41 +01:00
dependabot[bot]	8478d41bcb	chore(deps): Bump paste from 1.0.10 to 1.0.11 (#6430 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.10 to 1.0.11. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.10...1.0.11) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-19 10:31:05 +00:00
Dom Dwyer	b87d572e42	refactor: single PartitionResponse constructor Removes the PartitionResponse::new_no_batches() constructor, instead using an Option-wrapped data. Before that would have been confusing (many Option in the constructor signature) but now there's only one!	2022-12-19 11:25:20 +01:00
Dom Dwyer	c1db76bf9e	refactor: remove max seqnum in PartitionResponse Removes the redundant max_persisted_sequence_number in PartitionResponse, which was functionally replaced with completed_persistence_count for the Querier's parquet file discovery instead.	2022-12-19 11:21:53 +01:00
Dom Dwyer	13ed3f9acb	fix: show lack of partition data in query output Show that a PartitionResponse does not contain data in the Debug output.	2022-12-19 11:18:21 +01:00
dependabot[bot]	c72734473c	chore(deps): Bump async-trait from 0.1.59 to 0.1.60 (#6433 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.59 to 0.1.60. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.59...0.1.60) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-19 10:09:23 +00:00
Carol (Nichols \|\| Goulding)	07772e8d22	fix: Always return a PartitionRecord which maybe streams record batches Connects to #6421. Even if the ingester doesn't have data in memory for a query, we need to send back metadata about the ingester UUID and the number of files persisted so that the querier can decide whether it needs to refresh the cache.	2022-12-16 17:02:41 -05:00
Carol (Nichols \|\| Goulding)	473ce7a268	fix: Don't hardcode the transition shard id	2022-12-16 17:01:35 -05:00
Dom Dwyer	c830a83105	feat(ingester2): hot partition persistence This PR uses the MutableBatch persist cost estimation added in #6425 to selectively mark "hot" partitions for persistence. This uses a (composable!) "post-write" observer that is invoked after each buffer call - this allows the HotPartitionPersister in this commit to inspect the cost of the partition after applying the write, and if it exceeds the configurable cost threshold, enqueue it for persistence (rotating the buffer within the partition in the process). Unlike ingester(1), this implementation prevents overrun - the application of the write that exceeds the cost limit, and enqueueing the partition for persistence is atomic.	2022-12-16 19:33:34 +01:00
Marko Mikulicic	69d5148729	fix(ingester2): Make ingester2 work with existing catalogs and document quickstart	2022-12-16 13:16:31 +01:00
Dom Dwyer	933ab1f8c7	feat(ingester2): optimal persist parallelism This commit changes the behaviour of the persist system to enable optimal parallelism of persist operations, and improve the accuracy of the outstanding job bound / back-pressure. Previously all persist operations for a given partition were consistently hashed to a single worker task. This serialised persistence per partition, ensuring all updates to the partition sort key were serialised. However, this also unnecessarily serialises persist operations that do not need to update the sort key, reducing the potential throughput of the system; in the worst case of a single partition receiving all the writes, only one worker would be persisting, and the other N-1 workers would be idle. After this change, the sort key is inspected when enqueuing the persist operation and if it can be determined that no sort key update is necessary (the typical case), then the persist task is placed into a global work queue from which all workers consume. This allows for maximal parallelisation of these jobs, and the removes the per-worker head-of-line blocking. In the case that the sort key does need updating, these jobs continue to be consistently hashed to a single worker, ensuring serialised sort key updates only where necessary. To support these changes, the back-pressure system has been changed to account for all outstanding persist jobs in the system, regardless of type or assigned worker - a logical, bounded queue is composed together of a semaphore limiting the number of persist tasks overall, and a series of physical, unbounded queues - one to each worker & the global queue. The overall system remains bounded by the INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason about (it is independent of the number of workers, etc).	2022-12-15 18:30:51 +01:00
Dom Dwyer	e24d21255b	refactor: inject persist started timestamp Instead of recording the "enqueued_at" when initialising the PersistRequest, inject the value in. This lets us re-order the request construction while retaining accurate timing.	2022-12-15 18:28:08 +01:00
Dom	ede2627dcf	Merge branch 'main' into dom/deferred-load-peek	2022-12-15 17:16:02 +00:00
Dom	261eeacf3c	Merge branch 'main' into dom/ooo-parittion-persist	2022-12-15 17:07:53 +00:00
Dom Dwyer	7d7c8db334	feat(ingester2): out-of-order partition persist Previously data within a partition had to be persisted in the order in which the data was received. This was necessary for the correctness of the query API, as it utilised the lower-bound sequence number to determine what data was available in the object store. With the changes to the parquet discovery protocol / query API made in https://github.com/influxdata/influxdb_iox/pull/6365 this restriction can be lifted, allowing out-of-order persistence within a partition for increased parallelism / performance. This commit changes the PartitionData to accept out-of-order persist completion notifications, removing the ordering invariant from ingester2 (note that the persist ops currently remain ordered however).	2022-12-15 14:38:13 +01:00
Dom Dwyer	b15aebbddc	feat(deferred_load): peek() immediate values Adds a peek() method to the DeferredLoad construct, allowing a caller to immediately read the resolved value, or "None" if the value is unresolved or concurrently resolving. This allows a caller to optimistically read the value without having to block and wait for it to become available.	2022-12-15 14:33:44 +01:00
Dom Dwyer	e76b107332	feat(ingester2): persist back-pressure This commit causes an ingester2 instance to stop accepting new writes when at least one persist queue is full. Writes continue to be rejected until the persist workers have processed enough outstanding persist tasks to drain the queues to half of their capacity, at which point writes are accepted again. When a write is rejected, the ingester returns a "resource exhausted" RPC code to the caller. Checking if the system is in a healthy state for writes is extremely cheap, as it is on the hot path for all writes.	2022-12-14 17:17:17 +01:00
Paul Dix	d9c72bb93f	feat: optimize wal with batching (#6399 ) * feat: optimize wal with batching Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now. Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of. * chore: cleanup wal code for PR feedback	2022-12-14 16:07:20 +00:00
kodiakhq[bot]	66c610f7b1	Merge branch 'main' into cn/ingester-persisted-file-count	2022-12-14 14:58:31 +00:00
Carol (Nichols \|\| Goulding)	f29bed86c0	fix: Improve log messages and docs as suggested in code review Co-authored-by: Dom <dom@itsallbroken.com>	2022-12-14 09:52:09 -05:00
Dom Dwyer	8f0da90d76	docs: remove ref to PersistActor Fix bad reflink to something that no longer exists.	2022-12-13 16:59:15 +01:00
Dom Dwyer	309386b828	chore: silence spurious lint This is by design! Clippy just doesn't see the plan.	2022-12-13 16:59:14 +01:00
Dom Dwyer	1da9b63cce	fix(ingester2): persist deadlock Removes the submission queue from the persist fan-out, instead the PersistHandle now carries the shared state internally (cheaply cloned via ref counts). This also resolves the persist deadlock when under load.	2022-12-13 16:47:45 +01:00
kodiakhq[bot]	9e8ae1485f	Merge branch 'main' into dom/wal-bench	2022-12-13 15:19:32 +00:00
Dom Dwyer	5fa4e49098	feat(ingester2): persist active & queue timings Adds more debug logging to the persist code paths, as well as capturing & logging (at INFO) timing information tracking the time a persist task spends in the queue, the active time spent actually persisting the data, and the total duration of time since the request was created (sum of both durations).	2022-12-13 11:06:09 +01:00
dependabot[bot]	e108a8b6c9	chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-12-13 06:03:05 +00:00
Carol (Nichols \|\| Goulding)	1c7f322a4e	feat: Keep track of and report number of Parquet files persisted Per partition and starting over each time the ingester restarts. Fixes #6334.	2022-12-12 11:45:00 -05:00
Dom Dwyer	7c28a30d1b	test(ingester2): WAL replay benchmark This adds a simple WAL replay benchmark to ingester2 that executes a replay of a single line of LP. Unfortunately each file in the benches directory is compiled as it's own binary/crate, and as such is restricted to importing only "pub" types. This sucks, as it requires you to either benchmark at a high level (macro, not microbenchmarks - i.e. benchmarking the ingester startup, not just the WAL replay) or you are forced to mark the reliant types & functions as "pub", as well as all the other types/traits they reference in their signatures. Because the performance sensitive code is usually towards the lower end of the call stack, this can quickly lead to an explosion of "pub" types causing a large amount of internal code to be exported. Instead this commit uses a middle-ground; benchmarked types & fns are conditionally marked as "pub" iff the "benches" feature is enabled. This prevents them from being visible by default, but allows the benchmark function to call them. The benchmark itself is also restricted to only run when this feature is enabled.	2022-12-12 15:02:36 +01:00
Carol (Nichols \|\| Goulding)	2fd2d05ef6	feat: Identify each run of an ingester with a Uuid And send that UUID in the Flight response for queries to that ingester run. Fixes #6333.	2022-12-08 17:22:52 -05:00
Carol (Nichols \|\| Goulding)	6014c10866	test: Enable running ingester2/router RPC write servers in e2e tests Add configuration and server types to be able to create server fixtures for them.	2022-12-08 17:22:52 -05:00
dependabot[bot]	1d38d400f0	chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339 ) * chore(deps): Bump object_store from 0.5.1 to 0.5.2 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-06 07:53:54 +00:00
Marco Neumann	cd6a8a1a82	refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics (#6313 ) * refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics Closes #6310. * refactor: rename and tune default exec mem limits * fix: ingester2 bits after rebase	2022-12-05 12:38:28 +00:00
kodiakhq[bot]	228c81c6fb	Merge branch 'main' into dom/remove-comment	2022-12-02 17:58:33 +00:00
kodiakhq[bot]	fff6cdf951	Merge branch 'main' into dom/drop-empty-wal	2022-12-02 17:49:47 +00:00
Dom Dwyer	de52a24751	chore: remove old comment The ingester will replay the ops, and then immediately trigger a persist! This comment is done :) Outstanding is actually deleting the replayed files, which will come with the ref-counted WAL segment change later.	2022-12-02 18:44:45 +01:00
Dom Dwyer	a8f99d4593	feat(ingester2): drop empty WAL files A shutdown of an ingester that has received no writes leaves an empty WAL file - these can be deleted at startup.	2022-12-02 18:43:32 +01:00
Dom Dwyer	c3a88f105d	test: track_caller for make_write_op Changes the stack trace to point at the call site when a panic occurs within make_write_op, and adds a unique message per panic explaining the issue.	2022-12-02 18:42:28 +01:00
Dom Dwyer	26bf54d041	feat(ingester2): drop WAL segments after persist Changes the WAL rotation task to drop the WAL segment after the partition data has completely persisted. This logic contains a race condition outlined (at length) in the code comments, along with a plan to resolve it once I'm back from holiday. For now, the extremely low likelihood and minor impact of the race occurring is likely acceptable for testing purposes.	2022-12-02 18:29:35 +01:00
Dom Dwyer	f145d6415f	feat(ingester2): persist partitions on WAL rotate Changes the WAL rotation code to cause the current set of partitions to be submitted for persistence. Because rotating the WAL segment and triggering persistence is not atomic (not under an exclusive lock preventing writes) the persisted buffers MAY contain writes that appear in the new segment file. In the happy/non-crash path, this will have no effect, however if the ingester crashes and replays the WAL files, these writes will be duplicated into object storage (where compaction will resolve the issue) - this seems like a good trade-off, allowing us to avoid blocking write requests for as long as it takes to rotate the WAL and mark all partitions as persisting. (though currently WAL files are not dropped and thus everything is replayed all the time.)	2022-12-02 17:18:39 +01:00
Dom Dwyer	66aab55534	feat(ingester2): run persistence task Configures the initialisation of an ingester2 instance to spawn a persistence task (currently unused) and plumbs in various configuration parameters.	2022-12-02 17:18:39 +01:00
kodiakhq[bot]	1137f2fc7e	Merge branch 'main' into dom/buffer-tree-iter	2022-12-02 16:10:38 +00:00
Dom Dwyer	85a41e34df	refactor(ingester2): BufferTree partition iterator Expose an iterator of PartitionData within a BufferTree.	2022-12-02 17:01:34 +01:00
Dom Dwyer	f524687602	feat(ingester2): parallel partition persistence Implements actor-based, parallel persistence in ingester2 with controllable fan-out parallelism and queue depths. This implementation encapsulates the complexity of persistence, queuing and parallelism - the caller simply uses the handle to persist a partition, while the actor handles fan-out to a set of persistence workers, compaction in a separate thread-pool, and optional completion notifications. By consistently hashing persist jobs onto workers, parallelism is achieved across partitions, but serialisation of partition persists is enforced so that the sort key update is correctly serialised.	2022-12-02 16:34:03 +01:00
Dom	e25920c7c0	Merge branch 'main' into dom/partition-queue	2022-12-02 15:18:44 +00:00
kodiakhq[bot]	9e3d0fcefb	Merge branch 'main' into cn/ingester2	2022-12-02 13:39:55 +00:00
Dom Dwyer	4f928fbd0c	perf(ingester2): partition persist queue This commit swaps the existing single "persisting batch" slot (field) in a PartitionData for an ordered queue of outstanding partitions. This decouples marking a partition buffer for persistence from the actual persistence operation, allowing them to proceed at different rates. This reduces the complexity of persistence management, but also allows us to gracefully handle "hot" partitions; for example, this problematic scenario in the ingester(1) implementation during recovery: * Writes come into a partition, reaching a size/row/hotness limit * Partition is enqueued for persistence * More writes come into the new buffer, exceeding the same limit * Cannot persist the hot buffer because of outstanding persist op Without this change the only possibilities in this situation are: * stop ingest for the partition and error all writes that (partially!) write to the partition, or * continue accepting writes, allowing the partition to exceed the limit that marked it for persistence in the first place The latter is what the ingester(1) implementation does today, which results in partitions exceeding their row/size/age limits, which exist to limit the cost of generating the Parquet file from the buffer - this is a significant contributor to instability during recovery. This strategy enforces configured the limits on the partition buffer, but does not block / slow down recovery while persistence is completed.	2022-12-02 14:09:50 +01:00
Dom Dwyer	9c170dd506	refactor(ingester2): namespace name in partition Pushes the (ref-counted/shared) deferred NamespaceName resolver into the child TableData and PartitionData nodes in the BufferTree. This allows the PartitionData to provide the name of the namespace to which it belongs, in addition to the existing table name, and all relevant IDs.	2022-12-01 21:13:05 +01:00
Carol (Nichols \|\| Goulding)	a3bc31aa82	fix: Have ingester2 GrpcDelegate hold onto the catalog and metrics So that they don't have to be passed in again later for the services.	2022-12-01 11:39:27 -05:00
Dom Dwyer	81619fd5b3	refactor(ingester2): clean-up rotation task When an ingester2 instance goes out of scope, automatically stop the WAL rotation task.	2022-12-01 16:03:53 +01:00
Dom Dwyer	c7f88d779a	feat(ingester2): rotate wal periodically Spawn a background task to rotate the WAL file with the given internal.	2022-12-01 16:03:52 +01:00
Dom Dwyer	f40885d4ca	refactor(wal): remove needless async/await Obtaining a rotation handle isn't async.	2022-12-01 16:03:52 +01:00
kodiakhq[bot]	76e500cb31	Merge branch 'main' into dom/wal-replay	2022-12-01 14:34:51 +00:00
Andrew Lamb	14a9bc92e9	Revert "Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279 )" (#6294 )" (#6296 ) This reverts commit `b7e52c0d8d`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-01 14:20:43 +00:00
Dom Dwyer	cb248c75d5	feat(ingester2): WAL entry replay at startup Replay the WAL log, if any, at startup. Op replay is performed synchronously, during initialisation of the ingester2 instance, and passes all ops through the "normal" write path that the system uses once replay is complete, minus the WAL writer layer - this helps to DRY the write path and minimise different behaviours.	2022-12-01 15:01:43 +01:00
Dom Dwyer	1be9ffb409	refactor(ingester2): cache more hot partitions Now partition cache entries are smaller, the number of entries held in memory can be increased - this now uses ~2MiB of memory and drains the cache during execution, amortising to 0.	2022-12-01 13:45:19 +01:00
kodiakhq[bot]	047bcc6e7e	Merge branch 'main' into dom/remove-ordering	2022-12-01 12:37:54 +00:00
Andrew Lamb	b7e52c0d8d	Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279 )" (#6294 ) This reverts commit `039a45ddd1`.	2022-12-01 11:38:42 +00:00
Dom Dwyer	6639568665	refactor: remove unimplemented!() This lets us keep the existing test coverage for the new impl.	2022-12-01 11:35:00 +01:00
Dom Dwyer	464242ebc6	refactor: remove ordering asserts, sequence ranges This commit removes the invariant asserts of monotonicity carried over from the "ingester" crate - ingester2 does not define any ordering of writes within the system. This commit also removes the SequenceNumberRange as it is no longer useful to indirectly check the equality of two sets of ops - non-monotonic writes means overlapping ranges does not guarantee a full set of overlapping operations (gaps may be present). Likewise bounding values (such as "min persisted") provide no useful meaning in an out-of-order system.	2022-12-01 10:38:20 +01:00
Dom Dwyer	50d5e4a6f1	docs: arbitrarily reordering & sequence numbers Document the arbitrary reordering of concurrent writes in an ingester, and the potential divergence of WAL entries / buffered state. Also documents that in ingester2, sequence numbers only identify writes, not their ordering.	2022-12-01 10:38:20 +01:00
Dom Dwyer	f34d4db994	docs: reference WAL cancellation safety ticket Reference https://github.com/influxdata/influxdb_iox/issues/6281	2022-11-30 16:37:06 +01:00
Dom Dwyer	d2a3d0920b	feat(ingester2): commit writes to write-ahead log Adds WalSink, an implementation of the DmlSink trait that commits DML operations to the write-ahead log before passing the DML op into the decorated inner DmlSink chain. By structuring the WAL logic as a decorator, the chain of inner DmlSink handlers within it are only ever invoked after the WAL commit has successfully completed, which keeps all the WAL commit code in a singly-responsible component. This also lets us layer on the WAL commit logic to the DML sink chain after replaying any existing WAL files, avoiding a circular WAL mess. The application-logic level WAL code abstracts over the underlying WAL implementation & codec through the WalAppender trait. This decouples the business logic from the WAL implementation both for testing purposes, and for trying different WAL implementations in the future.	2022-11-30 16:37:05 +01:00
Andrew Lamb	039a45ddd1	chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279 ) * chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` * chore: Update thrift to 0.17 * fix: use workspace arrow-flight in ingester2 * chore: Update for API changes * fix: test * chore: Update hakari * chore: Update hakari again * chore: Update trace_exporters to latest thrift * fix: update test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-30 14:12:30 +00:00
Dom Dwyer	5f1635bfc5	feat(ingester2): sequence write ops Sequence all gRPC write requests to (internally) order the resulting DML operations. These sequence numbers are assigned from a timestamp oracle and passed through to the downstream DmlSink implementers.	2022-11-30 10:40:26 +01:00
Dom Dwyer	ace4b7f669	feat: operation timestamp sequencer Adds a TimestampOracle to provide an ingester-internal ordering to incoming DmlOperations using a logical clock.	2022-11-30 10:40:22 +01:00
Dom Dwyer	9648207f01	feat(ingester2): initialise an ingester2 instance Adds a public constructor to initialise an ingester2 instance.	2022-11-29 17:05:42 +01:00
Dom	0b1449e908	Merge branch 'main' into dom/buffer-tree-query	2022-11-29 15:06:41 +00:00
Andrew Lamb	fc520e0c0f	refactor: Remove unecessary optimize_record_batch (#6262 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-29 13:35:46 +00:00
Dom Dwyer	95216055d8	perf(ingester2): stream BufferTree partition data This commit implements the QueryExec trait for the BufferTree, allow it to be queried for the partition data it contains. With this change, the BufferTree now provides "read your writes" functionality. Notably the implementation streams the contents of individual partitions to the caller on demand (pull-based execution), deferring acquiring the partition lock until actually necessary and minimising the duration of time a strong reference to a specific RecordBatch is held in order to minimise the memory overhead. During query execution a client sees a consistent snapshot of partitions: once a client begins streaming the query response, incoming writes that create new partitions do not become visible. However incoming writes to an existing partition that forms part of the snapshot set become visible iff they are ordered before the acquisition of the partition lock when streaming that partition data to the client.	2022-11-29 12:01:47 +01:00
Dom Dwyer	de6f0468d8	refactor: associated QueryExec return type Allow the return type of the QueryExec trait's query_exec() method to be parametrised by the implementer. This allows the trait to be reused across different data sources that return differing concrete types.	2022-11-29 12:01:36 +01:00
Dom Dwyer	1a379f5f16	perf(ingester2): streaming query data sourcing Changes the query code (taken from the ingester crate) to stream data for query execution, tidy up unnecessary Result types and removing unnecessary indirection/boxing. Previously the query data sourcing would collect the set of RecordBatch for a query response during execution, prior to sending the data to the caller. Any data that was dropped or modified during this time meant the underlying ref-counted data could not be released from memory until all outstanding queries referencing it completed. When faced with multiple concurrent queries and ongoing ingest, this meant multiple copies of data could be held in memory at any one time. After this commit, data is streamed to the user, minimising the duration of time a reference to specific partition data is held, and therefore eliminating the memory overhead of holding onto all the data necessary for a query for as long as the client takes to read the data. When combined with an upcoming PR to stream RecordBatch out of the BufferTree, this should provide performant query execution with minimal memory overhead, even for a maliciously slow reading client.	2022-11-29 11:08:06 +01:00
Dom Dwyer	2ed9780f6b	refactor(ingester2): explicit PartitionStream type Simplify the streaming types by introducing explicitly named wrappers to improve visibility.	2022-11-29 11:08:02 +01:00
dependabot[bot]	b5aa39db4b	chore(deps): Bump tonic from 0.8.2 to 0.8.3 (#6249 ) Bumps [tonic](https://github.com/hyperium/tonic) from 0.8.2 to 0.8.3. - [Release notes](https://github.com/hyperium/tonic/releases) - [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/tonic/compare/v0.8.2...v0.8.3) --- updated-dependencies: - dependency-name: tonic dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-29 09:53:30 +00:00
Dom Dwyer	443ec49f24	feat: query exec tracing spans Implement a QueryExec decorator that emits named tracing spans covering the inner delegate's query_exec() execution. Captures the result, emitting the error string in the span on failure.	2022-11-28 13:29:18 +01:00
Dom Dwyer	a54326d1ae	refactor: rename Response -> QueryResponse Both more descriptive, and less conflict-y! This seems like a more sensible name for a system with many Response's.	2022-11-28 13:29:18 +01:00
Dom Dwyer	a0ab78298f	feat(ingester2): gRPC methods & type-erased init This commit implements the gRPC direct-write RPC interface (largely copied from the ingester crate), and adds a much improved RPC query handler. Compared to the ingester crate, the query API is now split into two defined halves - the API handler side, and types necessary to support it (server/grpc/query.rs) and the Ingester query execution side (a stub in query/exec.rs). These two halves maintain a separation of concerns, and are interfaced by an abstract QueryExec trait (in query/trait.rs). I also added the catalog RPC interface as it is currently exposed on the ingester, though I am unsure if it is used by anything. This commit also introduces the "init" module, and the IngesterRpcInterface trait within it. This trait forms the public ingester2 crate API, defining the complete set of methods external crates can expect to utilise in a stable, unchanging and decoupled way. The IngesterRpcInterface trait also serves as a method of type-erasure on the underlying handler implementations, avoiding the need to expose/pub the types, abstractions, and internal implementation details of the ingester to external crates.	2022-11-25 12:40:01 +01:00
Dom Dwyer	522ae6c2a3	docs: document possible FSM panic Document a possible panic if the data within a partition FSM cannot be converted to an Arrow RecordBatch.	2022-11-25 10:46:44 +01:00
CircleCI[bot]	44eeab7e2b	chore: Run cargo hakari tasks	2022-11-24 14:51:21 +00:00
Dom Dwyer	a66fc0b645	feat(ingester): ingester2 init Adds an ingester2 crate to hold the MVP of the Kafkaless project. This was necessary due to the tight coupling of the ingester internals with tests in external crates, and eases the parallel development of two version of the ingester. This commit contains various changes from the "ingester" crate, mostly removing the concept/references to a "shard" or "ShardId" where possible. This commit does not copy over all of the "ingester" crate - only those components that are definitely needed. I will drag across more as functionality is implemented.	2022-11-24 15:34:02 +01:00

1 2 3 4 5

237 Commits (970371929492d6c1557368cb48dd0bc5c24339ea)