influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	32414acb00	test(bench): ingester query partition pruning Adds benchmarks that exercise partition pruning during query execution within the ingester, for varying partition counts within a table, and varying row counts within each partition.	2023-07-24 17:26:48 +02:00
Fraser Savage	a2ca5ca17c	Merge branch 'main' into savage/hook-up-wal-reference-counter-actor	2023-07-17 10:49:45 +01:00
Fraser Savage	4572057a11	Merge branch 'main' into savage/hook-up-wal-reference-counter-actor	2023-07-13 14:37:42 +01:00
Fraser Savage	729851be58	test(ingester): Integration test for RPC write trace context inheritrance	2023-07-12 15:48:41 +01:00
Fraser Savage	bea9dbf7ab	test(ingester): Integration test dropping of persisted WAL segments This adds an integration test that writes some data to the ingester, waits for the WAL to be rotated and then ensures that the segment file has been dropped.	2023-07-06 12:12:58 +01:00
Dom Dwyer	226ad2b100	test(ingester): query projection Add an integration test driving query projection through the ingester.	2023-07-05 13:44:11 +02:00
Dom Dwyer	54a08853fe	test(ingester): split write / query tests Split the write & query integration tests into their own modules for clarity.	2023-07-05 13:44:10 +02:00
Fraser Savage	fa69994358	refactor(wal): Implement `Iterator` for ClosedSegmentFileReader The ClosedSegmentFileReader is pretty much an iterator anyways, this just enables using all the juicy combinators with it more easily.	2023-06-09 17:30:53 +01:00
Marco Neumann	6729b5681a	fix(ingester): re-transmit schema over flight if it changes (#7812 ) * fix(ingester): re-transmit schema over flight if it changes Fixes https://github.com/influxdata/idpe/issues/17408 . So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME schema. When the ingester crafts a response for a specific partition, this is also almost always the case however when there's a persist job running (I think) it may have multiple snapshots for a partition. These snapshots may have different schemas (since the ingester only creates columns if the contain any data). Now the current implementation munches all these snapshots into a single stream, and hands them over to arrow flight which has a high-perf encode routine (i.e. it does not re-check every single schema) so it sends the schema once and then sends the data for every batch (the data only, schema data is NOT repeated). On the receiver side (= querier) we decode that data and get confused why on earth some batches have a different column count compared to the schema. For the OG ingester I carefully crafted the response to ensure that we do not run into this problem, but apparently a number of rewrites and refactors broke that. So here is the fix: - remove the stream that isn't really as stream (and cannot error) - for each partition go over the `RecordBatch`es and chunk them according to the schema (because this check is likely cheaper than re-transmitting the schema for every `RecordBatch`) - adjust a bunch of testing code to cope with this * refactor: nicify code * test: adjust test	2023-05-23 14:27:11 +00:00
Dom Dwyer	9b211df053	test(ingester): persist & persistence metrics Adds a test that asserts (manually triggered) persistence generates a file, uploads it to object storage, inserts metadata into the catalog, and emits various persistence metrics.	2023-05-16 14:20:30 +02:00
Carol (Nichols \|\| Goulding)	1770d0f4d8	fix: Move ingester-querier gRPC communication to its own crate	2023-05-12 13:28:30 -04:00
Carol (Nichols \|\| Goulding)	0849ce6f2b	fix: Rename ingester2_test_ctx to ingester_test_ctx	2023-05-08 20:23:02 -04:00
Carol (Nichols \|\| Goulding)	56916cf942	fix: Rename ingester2 to ingester	2023-05-08 12:03:05 -04:00
Carol (Nichols \|\| Goulding)	6c2ce01f1e	fix: Remove old ingester and ioxd_ingester	2023-04-07 11:06:37 -04:00
Andrew Lamb	f93baf7693	chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045 ) * chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0 * fix: Update test output * fix: update more test output * fix: Update querier test output * chore: Run cargo hakari tasks * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting for selector tests --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>	2023-02-22 21:24:20 +00:00
Dom Dwyer	2d46a364dc	feat: namespace soft-delete support This commit adds initial support for "soft" namespace deletion, where the actual records & data remain, but are no longer queryable / writeable. Soft deletion is eventually consistent - users can expect to continue writing to and reading from a bucket after issuing a soft delete call, until the various components either restart, or have their caches flushed. The components treat soft-deleted namespaces differently: * router: ignore soft deleted namespaces * ingester: accept soft deleted namespaces * compactor: accept soft deleted namespaces * querier: ignore soft deleted namespaces * various gRPC services: ignore soft deleted namespaces This ensures that the ingester & compactor do not see rows "vanishing" from the database, and continue to make forward progress. Writes for the deleted namespace that are buffered in the ingester will be persisted as normal, allowing us to support "un-delete" operations where the system is restored to a the state at which the delete was issued (rather than loosing the buffered data). Follow-on work is required to ensure GC drops the orphaned parquet files after the configured GC time, and optimisations such as not compacting parquet from soft-deleted namespaces seems like a trivial win.	2023-02-13 12:01:35 +01:00
Dom Dwyer	a633964f2b	feat(catalog): return max table limit in schema The maximum number of tables is part of the Namespace, which is already loaded in its entirety. This commit copies the value into the NamespaceSchema, making it available for the router to utilise.	2023-02-06 17:33:55 +01:00
Carol (Nichols \|\| Goulding)	30fea67701	fix: Move variables within format strings. Thanks clippy! Changes made automatically using `cargo clippy --fix`.	2023-02-03 13:06:17 -05:00
Carol (Nichols \|\| Goulding)	fbfbe1adb4	fix: Remove track_caller from async fns as it's a no-op Rust 1.67 now says: warning: `#[track_caller]` on async functions is a no-op = note: see issue #87417 <https://github.com/rust-lang/rust/issues/87417> for more information = note: `#[warn(ungated_async_fn_track_caller)]` on by default	2023-02-03 13:06:01 -05:00
Marco Neumann	ec2e72d223	test: simplify test executors (#6312 ) Have a single global test executor w/ reasonable defaults. Also don't require tests to join/await executor shutdowns (most tests forget this anyways and will get a runtime warning). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 11:38:18 +00:00
Carol (Nichols \|\| Goulding)	9751512d44	fix: Insert columns in schema in ingester tests where we have table names	2022-11-18 10:40:40 -05:00
Carol (Nichols \|\| Goulding)	02c3083192	fix: Remove table names from Dml operations	2022-11-18 10:40:38 -05:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	6f7b1e2e26	feat: reject writes that are outside the retention period (#6148 ) * feat: reject writes that are outside the retention period * feat: add retention validator into handler stack * chore: Apply suggestions from code review Co-authored-by: Dom <dom@itsallbroken.com> * refactor: address review comments * test: unit tests fot retention validation * chore: address review comments * test: more unit tests and integration tests * refactor: make time inside retention period for emphemeral_mode test * fix: 2 hours Co-authored-by: Dom <dom@itsallbroken.com>	2022-11-17 20:55:58 +00:00
Carol (Nichols \|\| Goulding)	3943faf998	fix: Remove namespace from DmlWrite and DmlDelete constructors	2022-11-14 16:46:04 -05:00
Carol (Nichols \|\| Goulding)	c203e8295f	test: Keep track of namespaces by ID in ingester TestContext	2022-11-14 16:46:04 -05:00
Nga Tran	9c4266c503	refactor: first step to remove unused retention_duration (#6113 ) * refactor: first step to remove unused retention_duration * refactor: remove retenion_duration from update catalog Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-11 15:21:06 +00:00
Dom	d9c97795fc	feat: use IDs in ingester query API (#6093 ) * refactor: NS+table ID (instead of name) in querier<>ingester * feat(ingester): use IDs for query API Changes the ingester to utilise the ID fields (instead of names) sent over the query wire message wrapped within the Flight API. BREAKING: this changes the "query-ingester" CLI command arguments which now expects the namespace & table IDs, rather than their names. * refactor(ingester): add more query logging context Updates the log messages during query execution to include more context fields. * style: remove unused import Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-11-09 11:25:13 +00:00
Dom Dwyer	ddd6ab0ba4	refactor(write_buffer): pass IDs in wire format This commit is part of a two-part change in order to add the table & namespace IDs to the write buffer wire format. This commit forms the first half; changing the producer to send the IDs. In this commit the new ID values are never read on the consumer side, ensuring there is no consumer dependency on them. This ensures they remain operational during a rollout, where the consumer may be updated to the latest code dependent on the IDs before the producer is updated to send them. This also ensures we have a window of time where where the consumers can be rolled back after being updated, and still handle replaying messages in Kafka.	2022-11-02 13:28:56 +01:00
Dom Dwyer	72a358e52f	refactor(dml): PartitionKey required for writes Changes the DmlWrite type to require a PartitionKey be specified, instead of accepting an Option. This requirement was already in place - the write buffer upheld an invariant that all writes contained a partition key value (was not "None") or it panicked at runtime when attempting to enqueue the write. It is now possible to encode this invariant in the type system, which is what this change does.	2022-10-28 10:57:30 +02:00
Carol (Nichols \|\| Goulding)	59e1c1d5b9	feat: Pass trace id through Flight requests from querier to ingester Fixes #5723.	2022-10-20 08:55:30 -04:00
Dom Dwyer	40f1937e63	test: write buffer seeking tests Asserts write buffer seeking behaviour, including: * Seeking past already persisted data correctly * Skipping to next available op in non-contiguous offset stream * Skipping to next available op for dropped ops due to retention * Panics when seeking beyond available data (into the future) Removes a pair of tests that covered some of the above due to their tight coupling with ingester internals.	2022-10-19 12:28:02 +02:00
Dom Dwyer	7729494f61	test: write, query & progress API coverage This commit adds a new test that exercises all major external APIs of the ingester: * Writing data via the write buffer * Waiting for data to be readable via the progress API * Querying data and and asserting the contents This should provide basic integration coverage for the Ingester internals. This commit also removes a similar test (though with less coverage) that was tightly coupled to the existing buffering structures.	2022-10-19 11:51:15 +02:00
Dom Dwyer	b12d472a17	test(ingester): add integration TestContext Adds a test helper type that maintains the in-memory state for a single ingester integration test, and provides easy-to-use methods to manipulate and inspect the ingester instance.	2022-10-19 11:51:15 +02:00

34 Commits (1ddc64d68db906c6490f36d4aecde7ccd5bff945)