influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	f93baf7693	chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045 ) * chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0 * fix: Update test output * fix: update more test output * fix: Update querier test output * chore: Run cargo hakari tasks * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting for selector tests --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>	2023-02-22 21:24:20 +00:00
Raphael Taylor-Davies	d3601a59f8	chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756 ) * chore: update DataFusion * fix: test * chore: format * chore: clippy * chore: update arrow * chore: arrow upgrade fallout * chore: Run cargo hakari tasks * chore: remove failing warm compaction test * fix: flight error propagation * chore: update parquet size * fix: Update error message * chore: Update parquet metadata test --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-06 11:35:39 +00:00
Andrew Lamb	f639bf3e23	chore: refactor ingester to use upstream arrow-flight (#6622 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 15:16:13 +00:00
Andrew Lamb	6843eee1d2	feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations (#6460 ) * feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations Refactor existing flight server impl * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fixup code review comments * fix: update for more details * fix: Update names / types Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-04 13:36:16 +00:00
Marco Neumann	942a6100b5	fix: check schemas in `pretty_print_batches` (#6309 ) * fix: check schemas in `pretty_print_batches` I think most users of this function (and `assert_batches_eq`) assume that all batches have the same schema. If not, `pretty_print_batches` may either fail producing an actual table (some rows may have more or less columns) or silently produce a table that looks "alright". * fix: equalize schemas where it is required/desired Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 12:14:16 +00:00
Andrew Lamb	fc520e0c0f	refactor: Remove unecessary optimize_record_batch (#6262 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-29 13:35:46 +00:00
Marco Neumann	e4c12fa6a5	fix: slice flight response batches (#6205 ) * fix: slice flight response batches Same as #6094 but for the Apache Flight interface. Ref https://github.com/influxdata/idpe/issues/16073. * refactor: use `RecordBatch::slice` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 12:25:23 +00:00
Dom Dwyer	8dae6d3994	perf(ingester): address tables by ID only Changes the buffer tree to address TableData by their ID only (removing support for addressing tables by their string names). This removes the double reference book keeping / twin indexes and associated overhead. As part of this change, the TableName is now wrapped in a DeferredLoad in preparation for removal of the names in the DmlOperation wire format. This commit also switches the map of TableData within the NamespaceData (the parent node) to use the ArcMap for faster lookups and DRY exactly-once initialisation.	2022-11-14 11:27:19 +01:00
Dom Dwyer	2521aedb6a	perf(ingester): address namespaces by ID only Removes reliance on string name identifiers for namespaces in the ingester buffer tree, reducing the memory usage of the namespace index and associated overhead. The namespace name is required (though unused by IOx) in the IoxMetadata embedded within a parquet file, and therefore the name is necessary at persist time. For this reason, a DeferredLoad is used to query the catalog (by ID) for the name, at some uniformly random duration of time after initialisation of the NamespaceData, up to a maximum of 1 minute later. This ensures the query remains off the hot ingest path, and the jitter prevents spikes in catalog load during replay/ingester startup. As an additional / easy optimisation, the persist code causes a pre-fetch of the name in the background while compacting, hiding the query latency should it not have already been resolved. In order to keep the the ingester buffer & catalog decoupled / easily testable, this commit uses a provider/factory trait NamespaceNameProvider and corresponding implementation (NamespaceNameResolver) in a similar fashion to the PartitionResolver, allowing easy mocking for tests, and composition for prod code, allowing future optimisations such as pre-fetching / caching the "hot" namespace names at startup. Internal string identifier removal is a pre-requisite for removing string identifiers from the write wire format (#4880).	2022-11-11 14:37:21 +01:00
Dom	d9c97795fc	feat: use IDs in ingester query API (#6093 ) * refactor: NS+table ID (instead of name) in querier<>ingester * feat(ingester): use IDs for query API Changes the ingester to utilise the ID fields (instead of names) sent over the query wire message wrapped within the Flight API. BREAKING: this changes the "query-ingester" CLI command arguments which now expects the namespace & table IDs, rather than their names. * refactor(ingester): add more query logging context Updates the log messages during query execution to include more context fields. * style: remove unused import Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-11-09 11:25:13 +00:00
Dom Dwyer	38b0459994	test: simplify tests / remove catalog Remove the catalog from tests that only initialised an implementation in order to call buffer_operation().	2022-11-08 17:02:01 +01:00
Dom Dwyer	b73d07c22b	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-08 15:45:59 +01:00
Dom Dwyer	7ac0857a28	revert: granular per-partition locking This reverts commit `79d24fa350`.	2022-11-08 10:31:37 +01:00
Dom Dwyer	79d24fa350	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-07 13:45:03 +01:00
Andrew Lamb	4fb2843d05	refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037 ) * chore: Rename `schema::selection::Selection` to `schema::projection::Projection` * fix: docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 18:15:04 +00:00
Dom Dwyer	678fb81892	refactor(ingester): use partition buffer FSM This commit makes use of the partition buffer state machine introduced in https://github.com/influxdata/influxdb_iox/pull/5943. This commit significantly changes the buffering, and querying, of data from a partition, swapping out the existing "DataBuffer" for the new state machine implementation (itself simplified due to temporary lack of incremental snapshot generation, see #5944). This commit simplifies the query path, removing multiple types that wrapped one-another to pass around various state necessary to perform a query, with various query functions needing different types or combinations of types. The query path now operates using a single type (named "QueryAdaptor") that provides a queryable interface over the set of RecordBatch returned from a partition. There is significantly increased testing of the PartitionData itself, covering data in various states and the ordering of returned RecordBatch (to ensure correct materialisation of updates). There are also invariants upheld by the type system / compiler to minimise the complexities of working with empty batches & states, and many asserts that ensure (mostly existing!) invariants are upheld.	2022-10-27 10:15:15 +02:00
Carol (Nichols \|\| Goulding)	59e1c1d5b9	feat: Pass trace id through Flight requests from querier to ingester Fixes #5723.	2022-10-20 08:55:30 -04:00
Dom Dwyer	d0b546109f	refactor: impl converting IngesterQueryResponse An existing function to map the complex IngesterQueryResponse type to a simple set of RecordBatch existed in test code - this has been lifted onto an inherent method on the response type itself for reuse.	2022-10-19 11:51:15 +02:00
Andrew Lamb	9134ccd6c3	chore: Update datafusion again (#5855 ) * chore: Update datafusion * chore: Updates for changes in datafusion * chore: more updates * fix: update doc example Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 19:18:57 +00:00
Dom Dwyer	b294bb98aa	refactor: move query types to query_handler Moves types that are only used for handling queries to the query_handler module.	2022-10-11 17:58:55 +02:00
Dom Dwyer	c4f542bbe2	refactor(ingester): remove tombstone support This commit removes tombstone support from the ingester, and deletes associated code/helpers/tests. This commit does NOT remove tombstone support from any other service, but MAY include removing overlapping test coverage. This also removes the tombstone support from the Ingester -> Querier RPC response message. This has the nice side effect of removing a whole lot of thread spawning in the ingester tests for the Executor, speeding everything up!	2022-10-11 13:10:04 +02:00
Dom Dwyer	97c6e0f8ce	refactor: use TableName, not Arc<str> Adds a type wrapper TableName, internally an Arc<str> to leverage the type system instead of passing around untyped strings.	2022-10-10 19:09:43 +02:00
Dom Dwyer	abb9122e2c	refactor: carry namespace name in NamespaceData Changes the ingester's NamespaceData to carry a ref-counted string identifier as well as the ID. The backing storage for the name in NamespaceData is shared with the index map in ShardData, so it is effectively free!	2022-10-05 13:03:16 +02:00
Marco Neumann	55ef272920	refactor: acquire table locks concurrently (#5722 ) Waiting for one after the other (one per shard) in serial fashion likely increases latency too much.	2022-09-22 10:56:22 +00:00
Marco Neumann	365a246f8d	refactor: do not run de-dup in ingester for querier requests (#5626 ) * refactor: do not run de-dup in ingester for querier requests This removes the entire de-dup logic from the inegster for querier requests. Furthermore, it even removes the entire datafusion execution from the querier and just dumps the in-memory record batches as quickly as possible. No filters are applied. Note that even prior to this PR, we've never applied projections (tracked by #5624). Pros: - speed up query planning within the querier (since we need the ingester response for state reconciling) - lowered ingester CPU load Cons: - more querier<>ingester network traffic Closes #5602. * test: extend query test case * fix: ingester tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 07:33:54 +00:00
Marco Neumann	c66f16e4af	fix: ingester retries (#5708 ) * fix: retry ingester requests faster The retries introduced in #5695 are too slow and block the entire querier for minutes (until the very long gRPC timeout kicks in). * fix: add error details on why the query planning failed Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-21 09:27:47 +00:00
Dom Dwyer	6d00d6b683	test(ingester): refactor querier API tests This commit changes the prepare_data_to_querier() tests to drive the ingester state by applying DML ops, therefore driving the prod code paths (and testing them!) rather than having the tests set up what the tests believe is the correct internal ingester state, and then asserting on that state. This gives us much better coverage of prod code paths, decouples the tests from the internal state/representation of ingesters (making the tests less fragile), and removes a bunch of special-cased, test-only functions that are functionally similar, but not the same as, the prod functions. Unblocks #5658, further clean-up to come.	2022-09-20 16:24:27 +01:00
Dom Dwyer	07b08fa9cb	refactor: add table name in PartitionData A partition belongs to a table - this commit stores the table name in the PartitionData (which was readily available at construction time) instead of redundantly passing it into various functions at the risk of getting it wrong.	2022-09-16 17:59:22 +02:00
Dom Dwyer	ee8cdb48af	style(ingester): fmt imports & long strings Rewrite the imports to be a consistent order; std, external, crate and merge all crate-level imports into one use statement.	2022-09-14 14:20:19 +02:00
Dom Dwyer	074722eb3e	refactor(ingester): split data.rs into modules Breaks the gigantic data.rs file into sub-modules for Shard, Namespace, Table, Partition, and finally the actual data buffer used to store writes.	2022-09-14 14:20:19 +02:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Marco Neumann	caa0dfd1e0	refactor: query code clean ups (#5612 ) * refactor: remove dead code * refactor: `Deduplicator::build_scan_plan` consumes `self` There is no good reason to use the same `Deduplicator` twice. In contrast I'm quite sure that this would lead to nasty bugs, because `split_overlapped_chunks` exists early in some cases so the 2nd plan would have old and new chunks mixed together.	2022-09-12 13:00:56 +00:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
Marco Neumann	743c1692ea	refactor: stream query results from ingester to querier (#4875 ) * refactor: stream partitions from ingester Ref #4849. * refactor: do not collect record batched on the ingester side Ref #4849. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 12:58:50 +00:00
Marco Neumann	66c7d95312	refactor: use new ingester<>querier wire protocol (#4867 ) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-06-16 08:02:28 +00:00
Andrew Lamb	eca3b6b9a1	fix: reduce memory usage in ingester with less buffering prior to query engine (#4830 ) * refactor: remove another buffer copy in ingester * docs: Update arrow_util/src/util.rs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 18:22:55 +00:00
Andrew Lamb	7d2a5c299f	refactor: remove one buffer copy in the ingester (#4855 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 17:15:36 +00:00
Andrew Lamb	34e8659876	refactor: consolidate plan creation from `QueryChunk`s in `iox_query` (#4837 ) * refactor: consolidate plan creation from Chunks * docs: update docstrings Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 14:36:07 +00:00
Andrew Lamb	9fdbfb05e7	refactor: Use scan_and_filter in ReorgPlanner (#4822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-10 17:31:25 +00:00
Andrew Lamb	11cec18edc	refactor: Move `scan_and_filter` into a `common` module for reuse (#4823 ) * refactor: remove unused error variants * refactor: move scan_and_filter into a module so it can be reused * docs: update comments about pruning	2022-06-10 11:15:47 +00:00
Andrew Lamb	2ec7764fdd	refactor: rename builder like predicate methods to be `with_` (#4808 ) * refactor: rename builder like predicate methods to be `with_` * fix: merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-09 11:26:03 +00:00
Andrew Lamb	f34282be2c	fix: Do not run DataFusion optimizer pass twice (#4809 ) * fix: Do not run DataFusion optimizer pass twice * docs: improve docstring and logging	2022-06-08 21:01:22 +00:00
Andrew Lamb	afc1c12062	refactor: consolidate `PredicateBuilder` into `Predicate` (#4799 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-08 12:21:24 +00:00
Marco Neumann	c91dbe062e	test: "optimize" ingesterrecord batches in query tests (#4700 ) * test: "optimize" ingesterrecord batches in query tests It seems that I had the right idea in #4656 but wasn't able to trigger https://github.com/influxdata/conductor/issues/955 because the query tests do not "optimize" the record batches in the same way the actual gRPC implementation does. If we apply the same transformation we indeed end up with the same error. * fix: all batches within the ingester flight response must have same schema * refactor: simplify and reuse code Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 07:37:11 +00:00
Paul Dix	6af32b7750	feat: add concurrency limit for ingester queries (#4703 ) I've defaulted it to 20, we can adjust as needed. Closes #4657 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-30 10:22:17 +00:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00
Carol (Nichols \|\| Goulding)	9eb21095e7	feat: Add more logging in particular situations to debug flaky test	2022-05-16 16:46:29 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	485d6edb8f	refactor: Move IngesterQueryRequest to generated_types	2022-05-06 14:45:37 -04:00

1 2

64 Commits (50d9d4032206c374064747791fa64e1c17409e83)