influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	77a2541172	feat: flag partitions for delete (#6075 ) * feat: flag partition for delete * fix: compare the right date and time * chore: Run cargo hakari tasks * chore: cleanup * fix: typos * chore: rust style tidy ups in catalog Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Luke Bond <luke.n.bond@gmail.com>	2022-11-09 12:06:23 +00:00
Dom	d9c97795fc	feat: use IDs in ingester query API (#6093 ) * refactor: NS+table ID (instead of name) in querier<>ingester * feat(ingester): use IDs for query API Changes the ingester to utilise the ID fields (instead of names) sent over the query wire message wrapped within the Flight API. BREAKING: this changes the "query-ingester" CLI command arguments which now expects the namespace & table IDs, rather than their names. * refactor(ingester): add more query logging context Updates the log messages during query execution to include more context fields. * style: remove unused import Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-11-09 11:25:13 +00:00
Dom Dwyer	38b0459994	test: simplify tests / remove catalog Remove the catalog from tests that only initialised an implementation in order to call buffer_operation().	2022-11-08 17:02:01 +01:00
Dom Dwyer	226f14a97f	perf(ingester): remove table lookup query Now DML operations contain the table ID, the ingester has all necessary data to initialise the TableData buffer node without having to query the catalog. This also removes the catalog from the buffer_operation() call path, simplifying testing.	2022-11-08 17:00:44 +01:00
Dom Dwyer	225c3b97c1	perf(ingester): remove namespace lookup query Now DML operations contain the namespace ID, the ingester has all necessary data to initialise the NamespaceData buffer node without having to query the catalog.	2022-11-08 16:57:53 +01:00
Dom Dwyer	8ebea0df37	feat: table/namespace IDs in write protocol Expose the Table and Namespace IDs encoded within the serialised DML write (added in #6036). This makes the IDs available for use in the consumers, ending the transition period. This commit DOES NOT remove the strings sent over the wire.	2022-11-08 16:57:53 +01:00
Dom	b7f7ee6a13	Merge branch 'main' into dom/mutex-pushdown	2022-11-08 14:57:32 +00:00
Dom Dwyer	b73d07c22b	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-08 15:45:59 +01:00
Dom Dwyer	b8181119e1	refactor: push down per-partition op skipping This moves the logic that skips operations that do not need to be applied to a partition during shard replay from the table level, to the partition level.	2022-11-08 15:45:52 +01:00
Dom Dwyer	4c8882e33a	docs: ref link to fix PR	2022-11-08 15:17:46 +01:00
Dom Dwyer	d71f023a57	refactor: inline helpers Inline the hash generation & key comparator.	2022-11-08 15:17:46 +01:00
Dom Dwyer	8dd7f2c603	refactor: accept owned key for insert() Changes the bounds on the ArcMap to accept an owned key, avoiding an extra allocation. Cleans up the bounds on other fn to ensure the borrowed key impl Eq and is the ref type of K.	2022-11-08 15:17:46 +01:00
Dom Dwyer	bbc2afe2a1	refactor: extract key equality checking Creates a shared fn for checking key equality to DRY the various chaining checks.	2022-11-08 15:17:46 +01:00
Dom Dwyer	8eaccd518b	fix: cross-thread map entry visibility This commit changes the ArcMap HashBuilder to use the same instance as the underlying HashMap hasher. This prevents divergent hashing across threads that MAY initialise a hasher with a different seed.	2022-11-08 15:17:46 +01:00
Dom Dwyer	66a6e8e929	test: cross-thread hashmap entry visibility At the time of this commit, this test fails. Performing a get() on a key previously inserted by another thread should not fail.	2022-11-08 15:17:46 +01:00
Dom Dwyer	fbd25a06d0	revert: push down per-partition op skipping This reverts commit `425fd46def`.	2022-11-08 10:31:51 +01:00
Dom Dwyer	7ac0857a28	revert: granular per-partition locking This reverts commit `79d24fa350`.	2022-11-08 10:31:37 +01:00
Dom Dwyer	79d24fa350	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-07 13:45:03 +01:00
Dom Dwyer	425fd46def	refactor: push down per-partition op skipping This moves the logic that skips operations that do not need to be applied to a partition during shard replay from the table level, to the partition level.	2022-11-07 13:45:03 +01:00
kodiakhq[bot]	5e297e259b	Merge branch 'main' into dom/arcmap-get_or_insert_with	2022-11-07 11:47:00 +00:00
Andrew Lamb	034d9b371d	chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061 ) * chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` * fix: Update query_functions * fix: update for TimestampNanosecondArray API changes * fix: update for TimestampNanosecondArray API changes * chore: Update flatbuffers and remove rustsec warning * chore: Update text * fix: update more test * fix: Lock ahash to exactly 0.8.0 * fix: Update datafusion pin * chore: Run cargo hakari tasks Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 11:01:58 +00:00
Dom Dwyer	2b9e0e173f	refactor: rename ArcMap::get_or_insert_with() Renames ArcMap::get_or_else() to ArcMap::get_or_insert_with() for consistency with the stdlib HashMap Entry.	2022-11-07 11:56:55 +01:00
Marco Neumann	f511db380c	refactor: remove table name from chunks (#6063 ) It should be always clear from the context to which table a chunk belongs. I think having a table name bound to a chunk goes back to a time where chunks had multiple tables. Helps with #6049. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 10:42:57 +00:00
YIXIAO SHI	586035b34d	chore: delete metric duplicate character (#6057 ) * chore: delete metric duplicate character * fix: failure ci test case * fix: failure ci test case * fix: failure ci test case Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 10:04:31 +00:00
Dom Dwyer	6fa48731aa	feat: NamespaceId in DmlDelete Changes the DmlDelete to contain the NamespaceId for which it should be applied, propagating this value over the wire. Like the existing IDs within the DmlWrite, these values are marked unsafe to use due to avoid the consumers utilising them accidentally during deployment. Unlike DmlWrite, the DmlDelete is completely unused, so this is less of an issue.	2022-11-03 13:57:40 +01:00
Dom Dwyer	30f69ce4f6	feat: ArcMap values() snapshot Returns a snapshot of the values within an ArcMap.	2022-11-03 11:49:01 +01:00
Dom Dwyer	17890a9906	feat: add ArcMap map type Implements a map of K -> Arc<V> with exactly-once initialisation semantics. This map can be used to ensure a given key maps to singleton instances of V; exactly what all the nodes in the ingester "buffer tree" of shard -> namespace -> table -> partition require. This impl contains unused funcs (silenced with an allow(dead_code)) due to it being picked from a future branch.	2022-11-03 11:29:09 +01:00
Andrew Lamb	4fb2843d05	refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037 ) * chore: Rename `schema::selection::Selection` to `schema::projection::Projection` * fix: docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 18:15:04 +00:00
Dom Dwyer	ddd6ab0ba4	refactor(write_buffer): pass IDs in wire format This commit is part of a two-part change in order to add the table & namespace IDs to the write buffer wire format. This commit forms the first half; changing the producer to send the IDs. In this commit the new ID values are never read on the consumer side, ensuring there is no consumer dependency on them. This ensures they remain operational during a rollout, where the consumer may be updated to the latest code dependent on the IDs before the producer is updated to send them. This also ensures we have a window of time where where the consumers can be rolled back after being updated, and still handle replaying messages in Kafka.	2022-11-02 13:28:56 +01:00
Marco Neumann	45b3984aa3	refactor: simplify `QueryChunk` data access (#6015 ) * refactor: simplify `QueryChunk` data access We have only two types for chunks (now that the RUB is gone): 1. In-memory RecordBatches 2. Parquet files Loads of logic is duplicated in the different `read_filter` implementations. Also `read_filter` hides a solid amount of logic from DataFusion, which will prevent certain (future) optimizations. To enable #5897 and to simplify the interface, let the chunks return the data (batches or metadata for parquet files) directly and let `iox_query` perform the actual heavy-lifting. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 08:18:33 +00:00
Marco Neumann	072439e428	refactor: mandatory `QueryChunkMeta::summary` (#5997 ) With #5963 merged, all chunks now provide a summary (even though it may not contain data for all columns). So let's make it mandatory, which also removes a few 🙈-style `.except(...)` calls. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 16:38:02 +00:00
Carol (Nichols \|\| Goulding)	dad1ad1318	feat: Add the catalog service to ingester, querier, and compactor So that `remote get` that uses the catalog service can work no matter what kind of server you contact.	2022-10-28 10:49:26 -04:00
Carol (Nichols \|\| Goulding)	53445af25d	chore: Alphabetize some dependencies I can't handle not knowing where to look for a dependency or knowing where to add a new dependency.	2022-10-28 10:34:25 -04:00
Andrew Lamb	e9d04ffcb5	feat: Log how long each persist plan takes to complete (#5989 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 13:52:39 +00:00
kodiakhq[bot]	1567227b49	Merge branch 'main' into dom/require-partition-key	2022-10-28 10:31:22 +00:00
Marco Neumann	8447d46093	refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963 ) Use the table summary instead. This allows us to have a single mechanism that both IOx and DataFusion understand. This basically lifts the "basic table summary" mechanism that the querier uses to `iox_query` and let the compactor and ingester use the same mechanism. While not strictly necessary, simplifying the `QueryChunk[Meta]` interface helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 10:29:16 +00:00
Dom Dwyer	72a358e52f	refactor(dml): PartitionKey required for writes Changes the DmlWrite type to require a PartitionKey be specified, instead of accepting an Option. This requirement was already in place - the write buffer upheld an invariant that all writes contained a partition key value (was not "None") or it panicked at runtime when attempting to enqueue the write. It is now possible to encode this invariant in the type system, which is what this change does.	2022-10-28 10:57:30 +02:00
Dom Dwyer	5d2f4a0ad1	docs: fix issue URL for memory tracking bug	2022-10-27 10:15:15 +02:00
Dom Dwyer	f6416675c2	docs: mark hyperlink in rustdoc comments	2022-10-27 10:15:15 +02:00
Dom Dwyer	678fb81892	refactor(ingester): use partition buffer FSM This commit makes use of the partition buffer state machine introduced in https://github.com/influxdata/influxdb_iox/pull/5943. This commit significantly changes the buffering, and querying, of data from a partition, swapping out the existing "DataBuffer" for the new state machine implementation (itself simplified due to temporary lack of incremental snapshot generation, see #5944). This commit simplifies the query path, removing multiple types that wrapped one-another to pass around various state necessary to perform a query, with various query functions needing different types or combinations of types. The query path now operates using a single type (named "QueryAdaptor") that provides a queryable interface over the set of RecordBatch returned from a partition. There is significantly increased testing of the PartitionData itself, covering data in various states and the ordering of returned RecordBatch (to ensure correct materialisation of updates). There are also invariants upheld by the type system / compiler to minimise the complexities of working with empty batches & states, and many asserts that ensure (mostly existing!) invariants are upheld.	2022-10-27 10:15:15 +02:00
Carol (Nichols \|\| Goulding)	88c3a1f5e7	feat: Use workspace dep inheritance for the arrow-flight crate	2022-10-26 10:34:54 -04:00
Carol (Nichols \|\| Goulding)	3145e2c05b	feat: Use workspace dep inheritance for the arrow crate	2022-10-26 10:34:29 -04:00
Carol (Nichols \|\| Goulding)	44936f661a	feat: Use workspace dep inheritance for datafusion instead of shim crate	2022-10-26 10:33:56 -04:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
Dom Dwyer	39f826518b	revert: use histogram to record TTBR This reverts commit `c63312ce12`. This change fixed a low-priority alert when there was no traffic flowing through the system. The loss in TTBR value fidelity due to bucketing is a greater concern as it affects live, high-volume clusters and hinders operational insight.	2022-10-24 10:27:22 +02:00
Dom Dwyer	7b3fa43209	refactor: disable incremental snapshot generation This commit removes the on-demand, incremental snapshot generation driven by queries. This functionality is "on hold" due to concerns documented in: https://github.com/influxdata/influxdb_iox/issues/5805 Incremental snapshots will be introduced alongside incremental compactions of those same snapshots.	2022-10-21 17:41:43 +02:00
Dom	db83053be7	Merge branch 'main' into dom/buffer-fsm	2022-10-21 16:32:54 +01:00
Dom Dwyer	8ca72ceff1	docs: fix state mod comments	2022-10-21 17:32:19 +02:00
Dom Dwyer	c8fdd76033	feat(ingester): partition buffer state machine This commit introduces code that is intended to replace the current implicit state machine used by PartitionData. The existing code is still in use, the new code is NOT used in this commit. A follow-up commit will switch over to minimise the diff. This change has two main goals; * encapsulation & simplification for callers * robust implementation so developing correct additions is easier This is a significant refactor of the partition buffering logic to encapsulate the various states of data (buffering, snapshot, persisting and the mixed states between them) within the Partition. This alleviates the rest of the system from having to be concerned with the differences between "buffering" data, and "unpersisted data", "snapshot data", "persisting data", "persisting with snapshots" etc - callers now invoke a method called get_query_data() and they are provided with all the relevant data for a partition. This abstraction change alone significantly reduces code and test complexity in the rest of the ingester. For the second goal, the new implementation leverages an explicit state machine, encoded using typestates. Typestate ensures compile-time correctness of transitions and method calls, and the explicit FSM itself helps ensure the system progresses in the desired manner - this fixes and helps prevent bugs caused by implicit states such as: https://github.com/influxdata/influxdb_iox/issues/5805 This state machine makes the system states explicit and self-descriptive, helping to reduce the cost of developer on-boarding (no prior knowledge of "how this bit works") and reduces ongoing developer burden. This explicit nature also de-risks adding new functionality - it should be relatively easy to add concurrent snapshot generation or incremental compaction without introducing bugs. The state transition logic is abstracted away from callers, minimising the overhead of this strategy.	2022-10-21 14:25:51 +02:00
Carol (Nichols \|\| Goulding)	59e1c1d5b9	feat: Pass trace id through Flight requests from querier to ingester Fixes #5723.	2022-10-20 08:55:30 -04:00

1 2 3 4 5 ...

484 Commits (07505c8f721993c4b38a111458eea2023cfa974d)