influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	20ec47b00b	feat: virtual chunk order col (#7240 ) * feat: introduce `CHUNK_ORDER_COLUMN_NAME` * feat: impl `ChunkOrder` everywhere * feat: `ChunkOrder::get` * feat: emit chunk order column for `RecordBatchesExec` * feat: `chunk_order_field` * feat: chunk order col for parquet chunks * feat: optional chunk order col handling for dedup --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-17 09:39:21 +00:00
Carol (Nichols \|\| Goulding)	cc7c44f76a	chore: Upgrade to Rust 1.68 (#7175 ) * chore: Upgrade to Rust 1.68 * fix: Remove unnecessary into_iter, thanks Clippy! * fix: Use the size of the type, not a reference to the type... oops. Thanks clippy! * fix: Return block directly instead of creating a variable Thanks clippy! --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-12 13:22:20 +00:00
Dom Dwyer	be661890c5	docs: module-level overviews Adds one-liner documentation of what each module contains - this is helpful to understand what is where, when looking at the rendered docs.	2023-03-01 14:27:05 +01:00
Andrew Lamb	f93baf7693	chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045 ) * chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0 * fix: Update test output * fix: update more test output * fix: Update querier test output * chore: Run cargo hakari tasks * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting for selector tests --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>	2023-02-22 21:24:20 +00:00
Carol (Nichols \|\| Goulding)	fb5aa25c5b	fix: Separate most_recent_n into filtering by shard and not	2023-02-17 12:56:51 -05:00
Dom Dwyer	2d46a364dc	feat: namespace soft-delete support This commit adds initial support for "soft" namespace deletion, where the actual records & data remain, but are no longer queryable / writeable. Soft deletion is eventually consistent - users can expect to continue writing to and reading from a bucket after issuing a soft delete call, until the various components either restart, or have their caches flushed. The components treat soft-deleted namespaces differently: * router: ignore soft deleted namespaces * ingester: accept soft deleted namespaces * compactor: accept soft deleted namespaces * querier: ignore soft deleted namespaces * various gRPC services: ignore soft deleted namespaces This ensures that the ingester & compactor do not see rows "vanishing" from the database, and continue to make forward progress. Writes for the deleted namespace that are buffered in the ingester will be persisted as normal, allowing us to support "un-delete" operations where the system is restored to a the state at which the delete was issued (rather than loosing the buffered data). Follow-on work is required to ensure GC drops the orphaned parquet files after the configured GC time, and optimisations such as not compacting parquet from soft-deleted namespaces seems like a trivial win.	2023-02-13 12:01:35 +01:00
Dom Dwyer	a633964f2b	feat(catalog): return max table limit in schema The maximum number of tables is part of the Namespace, which is already loaded in its entirety. This commit copies the value into the NamespaceSchema, making it available for the router to utilise.	2023-02-06 17:33:55 +01:00
Raphael Taylor-Davies	d3601a59f8	chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756 ) * chore: update DataFusion * fix: test * chore: format * chore: clippy * chore: update arrow * chore: arrow upgrade fallout * chore: Run cargo hakari tasks * chore: remove failing warm compaction test * fix: flight error propagation * chore: update parquet size * fix: Update error message * chore: Update parquet metadata test --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-06 11:35:39 +00:00
Carol (Nichols \|\| Goulding)	38b204c604	fix: Update test expectation, need to investigate	2023-02-03 13:06:20 -05:00
Carol (Nichols \|\| Goulding)	30fea67701	fix: Move variables within format strings. Thanks clippy! Changes made automatically using `cargo clippy --fix`.	2023-02-03 13:06:17 -05:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Carol (Nichols \|\| Goulding)	4658510102	fix: For Ingester2, persist a particular namespace on demand and share MiniClusters This should hopefully help CI from running out of Postgres connections 😬 The old architecture will still need to be non-shared and persist everything.	2023-01-25 10:36:56 -05:00
Carol (Nichols \|\| Goulding)	8783623a19	docs: This method doesn't block until the data is persisted	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	59914906b6	fix: Only reset persist everything flag if data has been persisted	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	3dbaeedca6	feat: Try implementing the persist api in a diffferent way	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	81f5f3b75f	feat: Implement the persist service gRPC API on the old ingester for query_tests2 to use	2023-01-19 16:44:30 -05:00
Andrew Lamb	f639bf3e23	chore: refactor ingester to use upstream arrow-flight (#6622 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 15:16:13 +00:00
Andrew Lamb	8410998408	chore: Update datafusion to Jan 17, 2023 (2 / 2) and arrow/parquet `30.0.1` (#6604 ) * chore: Update datafusion to Jan 9, 2023 (2 / 2) and arrow/parquet `30.0.1` * chore: Update for changes in arrow ipc * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2023-01-18 15:51:24 +00:00
Nga Tran	b856edf826	feat: function to get parttion candidates from partition table (#6519 ) * feat: function to get parttion candidates from partition table * chore: cleanup * fix: make new_file_at the same value as created_at * chore: cleanup Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-06 16:20:45 +00:00
Raphael Taylor-Davies	e1036a0c63	refactor: cleanup schema boxing (#6511 ) * refactor: cleanup Schema boxing * chore: clippy	2023-01-06 10:57:39 +00:00
Andrew Lamb	6843eee1d2	feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations (#6460 ) * feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations Refactor existing flight server impl * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fixup code review comments * fix: update for more details * fix: Update names / types Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-04 13:36:16 +00:00
Carol (Nichols \|\| Goulding)	7c6ccdb6d7	fix: Use keys and values functions. Thanks clippy!	2022-12-21 14:32:35 -05:00
Dom Dwyer	adc6fcfb04	feat(catalog): linearise sort key updates Updating the sort key is not commutative and MUST be serialised. The correctness of the current catalog interface relies on the caller serialising updates globally, something it cannot reasonably assert in a distributed system. This change of the catalog interface pushes this responsibility to the catalog itself where it can be effectively enforced, and allows a caller to detect parallel updates to the sort key.	2022-12-20 12:31:00 +01:00
Carol (Nichols \|\| Goulding)	1c7f322a4e	feat: Keep track of and report number of Parquet files persisted Per partition and starting over each time the ingester restarts. Fixes #6334.	2022-12-12 11:45:00 -05:00
Carol (Nichols \|\| Goulding)	2fd2d05ef6	feat: Identify each run of an ingester with a Uuid And send that UUID in the Flight response for queries to that ingester run. Fixes #6333.	2022-12-08 17:22:52 -05:00
Marco Neumann	942a6100b5	fix: check schemas in `pretty_print_batches` (#6309 ) * fix: check schemas in `pretty_print_batches` I think most users of this function (and `assert_batches_eq`) assume that all batches have the same schema. If not, `pretty_print_batches` may either fail producing an actual table (some rows may have more or less columns) or silently produce a table that looks "alright". * fix: equalize schemas where it is required/desired Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 12:14:16 +00:00
Marco Neumann	ec2e72d223	test: simplify test executors (#6312 ) Have a single global test executor w/ reasonable defaults. Also don't require tests to join/await executor shutdowns (most tests forget this anyways and will get a runtime warning). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 11:38:18 +00:00
Andrew Lamb	fc520e0c0f	refactor: Remove unecessary optimize_record_batch (#6262 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-29 13:35:46 +00:00
Dom Dwyer	9eafa9dbed	style: consistent import ordering Reorder all imports in the ingester to match a consistent order: * stdlib * external crates * intra-crate imports This helps prevent merge conflicts & keeps everything tidy.	2022-11-22 14:11:10 +01:00
Dom Dwyer	ee8b728c32	refactor: decouple Shard & BufferTree Splits out the nested tree of namespace -> tables -> partitions (referred to as the "buffer tree") from the Shard which previously held the namespace map. This allows the BufferTree to exist without a shard, or many trees to exist within a shard, etc.	2022-11-22 14:11:10 +01:00
Marco Neumann	e4c12fa6a5	fix: slice flight response batches (#6205 ) * fix: slice flight response batches Same as #6094 but for the Apache Flight interface. Ref https://github.com/influxdata/idpe/issues/16073. * refactor: use `RecordBatch::slice` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 12:25:23 +00:00
Dom Dwyer	097f0acb85	refactor: move SequenceNumberRange Moves the SequenceNumberRange type out of "data" and into the root to be reused outside of the data module. This construct is universally useful across all the ingester code.	2022-11-21 16:11:55 +01:00
Dom Dwyer	1938c18c50	refactor: decouple DmlSink error type Allows different DmlSink implementations to return different error types. This allows for small, concise errors that are local to the DmlSink implementation and specific to it. This helps avoid bloated "kitchen sink" error types.	2022-11-21 15:29:13 +01:00
Dom Dwyer	64c9d87b9b	refactor: move DmlSink Extracts the DmlSink trait into its own module - it is independent of the Kafka handler and will be reused.	2022-11-21 15:02:24 +01:00
Dom Dwyer	85c8d16680	refactor: add a message to unreachable!() Adds a message to say an impossible thing is impossible.	2022-11-18 17:33:58 +01:00
Dom Dwyer	9dc32f1c16	refactor: remove names from DML init Fixes conflicts introduced by #6170.	2022-11-18 17:31:56 +01:00
Dom	59b3c793d3	Merge branch 'main' into dom/ingester-rpc-write	2022-11-18 16:21:07 +00:00
Dom Dwyer	9351e01068	refactor: log dml apply errors Ensures DML apply errors are recorded in the ingester logs.	2022-11-18 16:48:31 +01:00
Dom Dwyer	16eed699fd	refactor: avoid needless partition key clone Moves the trace! invocation to before the DmlWrite init to avoid having to clone the partition key.	2022-11-18 16:46:14 +01:00
Carol (Nichols \|\| Goulding)	02c3083192	fix: Remove table names from Dml operations	2022-11-18 10:40:38 -05:00
Dom Dwyer	90dd9906f6	feat(ingester): rpc write endpoint Adds a handler implementation of the gRPC WriteService to receive direct RPC writes from a router. This code is currently unused.	2022-11-18 16:36:19 +01:00
Dom Dwyer	229e2adbb1	refactor: split gRPC services into modules Splits the everything-grpc-in-one-file into smaller, per-service modules.	2022-11-18 15:51:54 +01:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	6f7b1e2e26	feat: reject writes that are outside the retention period (#6148 ) * feat: reject writes that are outside the retention period * feat: add retention validator into handler stack * chore: Apply suggestions from code review Co-authored-by: Dom <dom@itsallbroken.com> * refactor: address review comments * test: unit tests fot retention validation * chore: address review comments * test: more unit tests and integration tests * refactor: make time inside retention period for emphemeral_mode test * fix: 2 hours Co-authored-by: Dom <dom@itsallbroken.com>	2022-11-17 20:55:58 +00:00
kodiakhq[bot]	1a49fa4864	Merge branch 'main' into cn/test-refactor	2022-11-17 14:01:36 +00:00
Dom Dwyer	5afe58d4d2	refactor: remove unused errors These error states are no longer possible after several refactors, but do not cause a "not used" lint because of macro magic.	2022-11-17 13:53:54 +01:00
Carol (Nichols \|\| Goulding)	d4715a9fde	fix: Simplify tests by using and creating more test helpers The most important part of this is creating the DmlWrites in one spot.	2022-11-16 21:48:43 -05:00
Carol (Nichols \|\| Goulding)	4e2b68a7c5	fix: Simplify test by not actually creating a catalog namespace This isn't actually needed for what this test is testing.	2022-11-16 21:06:44 -05:00
Carol (Nichols \|\| Goulding)	b6286767b0	fix: Validating the schema in ingester tests isn't necessary The router validates schemas; schema validation shouldn't be tested in the ingester	2022-11-16 21:05:51 -05:00
Carol (Nichols \|\| Goulding)	c7b9866483	feat: Have make_write_op take the table name as an argument to be more flexible	2022-11-16 21:05:46 -05:00

1 2 3 4 5 ...

505 Commits (50d9d4032206c374064747791fa64e1c17409e83)