influxdb

Commit Graph

Author	SHA1	Message	Date
wiedld	a02f7e7f3f	chore: rename disk protection to DiskSpaceMetric	2023-07-05 13:47:07 -07:00
wiedld	b961bc79c4	refactor: move the background task handler onto the parent IngesterGuard * follow the pattern of the periodic wal rotation * do NOT follow the pattern of the wal.flusher_task	2023-07-05 13:13:13 -07:00
wiedld	b4b89699cd	refactor: make struct signature be (path, registry) * the metric attributes are hardcoded to the path * the duration (frequency) of the background task is hardcoded * the tick.await now occurs after the first metric recording, such that the test doesn't have to wait 15 seconds.	2023-07-05 12:51:23 -07:00
Fraser Savage	15b22728cc	feat(ingester): Drop WAL segments once all writes are persistent This implements `PersistCompletionObserver` for the `WalReferenceHandle` so that it can be given to the persist handle and notified of persist completions in order to drop WAL segments once all writes are persistent.	2023-07-05 16:09:03 +01:00
Fraser Savage	b4a5d994d7	refactor(ingester): Use multi-table write op test util for wal_sink test	2023-07-05 15:28:30 +01:00
Fraser Savage	9ca0abfe0d	feat(ingester): WIP - WAL reference tracking of unbuffered writes This commit updates the DML sink for the write-ahead log to notify the reference tracker of writes that have been committed to the log, but failed to be applied to the buffer tree.	2023-07-05 15:28:28 +01:00
Fraser Savage	f481ce7070	refactor(ingester): Expose `SequenceNumberSet` for each ingest op This allows code shuttling around ingest operations to know how the operation has sequenced without having to fiddle about with the value.	2023-07-05 15:23:42 +01:00
Fraser Savage	fd8a89deea	feat(ingester): WIP - WAL rotate task uses reference tracker for delete This is the first commit in line to connect the WAL segment reference tracker actor up to the rest of the ingester. It removes the segment file deletion and hacky sleep from the rotate task, deferring to the actor for deletion tracking.	2023-07-05 15:23:37 +01:00
Fraser Savage	7b2ef53c7b	refactor(ingester): Notify `SequenceNumberSet` when tracking unbuffered writes Writes now contain multiple sequence numbers, so the WAL reference actor must be notified of all sequence numbers contained for a write that failed to be applied to the buffer.	2023-07-05 15:13:29 +01:00
Dom	af12edec38	Merge branch 'main' into dom/optimised-partition-pushdown	2023-07-05 15:01:13 +01:00
Fraser Savage	2da99f8032	refactor: Use `const` instead of unnecessary lazy_static Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-07-05 14:42:55 +01:00
Fraser Savage	e74a7a7dd4	test(wal): Test correct assignment of write per-partition sequence numbers This adds extra test coverage for the ingester's WAL replay & RPC write paths, as well as the WAL E2E tests, to ensure that all sequence numbers present in a WriteOperation/WalOperation are encoded and present when decoded.	2023-07-05 14:42:47 +01:00
Fraser Savage	e6e09d0c15	feat(ingester): Assign individual sequence numbers for writes per partition This commit asks the oracle for a new sequence number for each table batch of a write operation (and thus each partition of a write) when handling an RPC write operation before appending the operation to the WAL. The ingester now honours the sequence numbers per-partition when WAL replay is performed.	2023-07-05 14:29:27 +01:00
Fraser Savage	30939cfe96	refactor(wal): Remove op-level `sequence_number`, use per table map This commit removes the op-level sequence number from the proto definition, now reading and writing solely to the per table (and thus per partition) sequence number map. Tables/partitions within the same write op are still assigned the same number for now, so there should be no semantic different	2023-07-05 14:20:43 +01:00
kodiakhq[bot]	70a6e60415	Merge branch 'main' into savage/use-u64-for-sequence-number	2023-07-05 12:55:44 +00:00
Dom Dwyer	7d0e3637ed	perf(ingester): projection pushdown to data source Prior to this change projection pushdown was implemented as a filter, which meant a query using it would take the following steps: * Query arrives * Find necessary partition data * Copy all the partition data into a RecordBatch * Filter that RecordBatch to apply the projection * Return results to caller This is far from ideal, as the underlying partition data is copied in its entirety and then the unneeded columns discarded - a pure waste! After this PR, the projection is pushed down to the point of RecordBatch generation: * Query arrives * Find necessary partition data * Copy only the projected columns to a RecordBatch * Return results to the caller This minimises the amount of data copying, which for large amounts of data should lead to a meaningful performance improvement when querying for a subset of columns. It also uses a slightly more efficient projection implementation by using a single pass over the columns (still O(n) but less constant overhead).	2023-07-05 13:44:11 +02:00
Dom Dwyer	226ad2b100	test(ingester): query projection Add an integration test driving query projection through the ingester.	2023-07-05 13:44:11 +02:00
Dom Dwyer	54a08853fe	test(ingester): split write / query tests Split the write & query integration tests into their own modules for clarity.	2023-07-05 13:44:10 +02:00
Dom Dwyer	09974c66db	perf: short-circuit QueryAdaptor row count check Don't inspect every RecordBatch when checking for at least one row - stop as soon as 1 row is observed.	2023-07-05 13:44:09 +02:00
Dom Dwyer	a17bd3bded	refactor: don't Arc-wrap RecordBatch instances RecordBatch are internally ref-counted, so don't Arc wrap them again.	2023-07-05 13:44:09 +02:00
Dom Dwyer	8f0ae77184	test(bench): ingester query & projection Benchmark query performance against a variety of row/column counts, with and without projection.	2023-07-05 13:44:08 +02:00
dependabot[bot]	3827257f94	chore(deps): Bump thiserror from 1.0.40 to 1.0.41 (#8149 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.40 to 1.0.41. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.40...1.0.41) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-05 09:25:14 +00:00
dependabot[bot]	b5c9628f0f	chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-05 09:05:13 +00:00
dependabot[bot]	9a03d9c9fe	chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-04 07:57:41 +00:00
Dom Dwyer	0297fe3651	refactor: less nesting in partition pruning logic Improve readability by pulling the partition pruning logic into it's own function and clean up some minor bits.	2023-07-03 17:25:03 +02:00
Dom Dwyer	edf6686130	fix(test): custom partitioning template pruning Configure the partition pruning test to use a partition template that partitions on the "region" field. This will allow it to be used for pruning at query time.	2023-07-03 17:25:03 +02:00
Marco Neumann	36ed914689	test: type coercion in ingester tests	2023-07-03 17:25:02 +02:00
Marco Neumann	171b2a14c7	fix: doc link	2023-07-03 17:25:01 +02:00
Marco Neumann	e9b456df1f	fix: do not panic for pruning errors	2023-07-03 17:25:00 +02:00
Marco Neumann	0bcf85d48c	refactor: de-dup code	2023-07-03 17:24:59 +02:00
Carol (Nichols \|\| Goulding)	8ebf390d9c	feat: Try to prune ingester partitions by partition key This is hacktastic.	2023-07-03 17:24:58 +02:00
Fraser Savage	da34eb7b35	feat: Load both table name and partition template in the ingester	2023-07-03 17:24:57 +02:00
Fraser Savage	5f759528d3	test(ingester): Add `BufferTree` test for predicate-filtered queries	2023-07-03 17:24:56 +02:00
Fraser Savage	246c2b0749	refactor(ingester): Accept a predicate as parameter to `query_exec` This will allow the ingester to apply a predicate when serving a query and only stream back data that satisfies the predicate.	2023-07-03 17:24:56 +02:00
dependabot[bot]	9f00c9c4ef	chore(deps): Bump pin-project from 1.1.1 to 1.1.2 (#8129 ) Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.1.1 to 1.1.2. - [Release notes](https://github.com/taiki-e/pin-project/releases) - [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/pin-project/compare/v1.1.1...v1.1.2) --- updated-dependencies: - dependency-name: pin-project dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-03 09:10:28 +00:00
Marco Neumann	ce6a2fb613	refactor: remove `QueryChunk::column_values` (#8111 ) Similar to #8109. This was once implemented by the RUB but as it stands right now, no chunk implements this anymore. If we ever want to bring this back, we should use the output of `QueryChunk::data` instead (i.e. use a data-based implementation instead of a per-chunk one). Closes #8096.	2023-07-03 09:03:21 +00:00
wiedld	d64a908823	Merge branch 'main' into 7899/wal-disk-metrics	2023-06-30 18:59:49 -07:00
dependabot[bot]	ede8e32804	chore(deps): Bump pin-project from 1.1.0 to 1.1.1 (#8118 ) Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/taiki-e/pin-project/releases) - [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/pin-project/compare/v1.1.0...v1.1.1) --- updated-dependencies: - dependency-name: pin-project dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-30 08:51:56 +00:00
kodiakhq[bot]	16f18fdd53	Merge branch 'main' into cn/use-the-test-constants-luke	2023-06-29 15:04:53 +00:00
Marco Neumann	b982ee180e	refactor: remove `QueryChunk::column_names` (#8109 ) This interface was once specially implemented by the RUB. The only actual implementation of it is within the querier that just forwards it to a simple schema scan. Lift this semantic to `iox_query_influxrpc` instead so all the chunks can use it. If we ever want to optimize this again, we should use `QueryChunk::data` instead (i.e. instead of implementing it within the chunk it should use the data method and do something smart based on that). First half of #8096.	2023-06-29 13:43:10 +00:00
Marco Neumann	dcb4a9bb5c	refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107 ) Closes #8095.	2023-06-29 11:02:48 +00:00
Marco Neumann	4638b89d93	refactor: migrate retention to proper predicates (#8092 ) Do not (ab)use per-chunk delete predicates for the retention policy. Instead use a per-table predicate. This makes the code way cleaner, since the scoping is correct (i.e. delete predicates are a table-wide attribute, not a chunk-based one) and it is consistent time predicates that the user providers (e.g. via `WHERE time > x`). It also allows us to remove delete predicates (in their current, non-scalable form) from the query path. A potential future version would likely not use per chunk predicates (and "is processed" markers) but use the timestamp / chunk order to determine to which data the predicate should be applied. Note that the lowering of the retention policy changed slightly from ```text (time > (now() - retention)) AND (time < MAX) ``` to ```text time > (now() - retention) ``` Since the `MAX` cut is just an artifact of the lowering and was unnecessary. Closes #7409. Closes #7410.	2023-06-29 08:36:37 +00:00
dependabot[bot]	b15c6062a9	chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-28 13:18:08 +00:00
dependabot[bot]	bb6f481de8	chore(deps): Bump uuid from 1.3.4 to 1.4.0 (#8099 ) Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.3.4 to 1.4.0. - [Release notes](https://github.com/uuid-rs/uuid/releases) - [Commits](https://github.com/uuid-rs/uuid/compare/1.3.4...1.4.0) --- updated-dependencies: - dependency-name: uuid dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-28 10:26:53 +00:00
Carol (Nichols \|\| Goulding)	d7959e7b0c	refactor: Extract a test constant for a namespace ID different than ARBITRARY_NAMESPACE_ID	2023-06-26 17:26:14 -04:00
Carol (Nichols \|\| Goulding)	dc21c48404	refactor: Use and create test constants for partition keys Replace: - PartitionKey p1 with ARBITRARY_PARTITION_KEY - PartitionKey p2 with PARTITION2_KEY - PartitionKey p3 with PARTITION3_KEY	2023-06-26 17:26:03 -04:00
Carol (Nichols \|\| Goulding)	086a073fda	refactor: Extract and use test constants for table values throughout these tests Moved TABLE2_ID and TABLE2_NAME to the top of the test module, even though TABLE2_NAME is only used in one spot, to encourage use of the constants if new tests are added to this file that need a table that's different from the arbitrary table. Replaced all occurrences of TableId::new(1234) with TABLE2_ID even though TABLE2_ID is 1234321; the exact value doesn't matter, the important property is that it does not equal ARBITRARY_TABLE_ID (which is 4).	2023-06-26 17:25:55 -04:00
Carol (Nichols \|\| Goulding)	ce4e67e921	refactor: Use and create test constants for partition IDs Replace: - PartitionId::new(0) with ARBITRARY_PARTITION_ID (which is actually 1) - PartitionId::new(1) with PARTITION2_ID (actually 2) - PartitionId::new(2) with PARTITION3_ID (actually 3) So while adding one is a bit confusing in this diff, in the long run, this will make the test more understandable and consistent with other tests.	2023-06-26 17:25:46 -04:00
Carol (Nichols \|\| Goulding)	afcd2d859d	refactor: Use test constants in more places So that when I change the type of PartitionIds to TransitionPartitionId, I don't have to update all these places that just need an arbitrary partition ID or related values. These test constants probably didn't exist when these tests were created.	2023-06-26 17:25:14 -04:00
Fraser Savage	62cb6594c8	refactor(ingester): Use unsigned sequence number, remove its `Sqlx::Type` Now that sequence numbers are internal to the ingester and the WAL, there's no need for them to be a signed integer. As noted by [#7260](https://github.com/influxdata/influxdb_iox/issues/7260) this was a quirk related to the kafka-based IOx and Postgres only supported signed integers.	2023-06-23 16:39:11 +01:00
Carol (Nichols \|\| Goulding)	1912840c25	docs: Update size calculations in the description of PartitionCache	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	41420cb920	fix: Borrow transition partition ID when possible	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	d991e12fbb	feat: Send PartitionHashId from ingesters to queriers	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Fraser Savage	fab088f680	refactor(ingester): Split up the `WriteOperation` sub-types into separate modules	2023-06-22 10:08:26 +01:00
wiedld	fd881ea82e	chore: add disk protection to the wal directory	2023-06-21 22:12:58 -07:00
wiedld	09654542ec	Revert "chore(7899): add InstrumentedDiskProtection to the WAL" This reverts commit `f632e4f023`.	2023-06-21 09:51:37 -07:00
Fraser Savage	f6ad920f31	refactor(ingester): Remove `set_span_context()` from `IngestOp`	2023-06-21 15:58:16 +01:00
Fraser Savage	35c5017410	docs(ingester): Update references to `DmlOperation` in doc comments	2023-06-21 11:45:46 +01:00
Fraser Savage	d3775e67f8	chore(ingester): Tidy TODO comment in `dml_payload`	2023-06-21 11:16:51 +01:00
Fraser Savage	4f489b9d9d	refactor(ingester): Add dml_payload::encode mod for encode to `DatabaseBatch` This enables the whole of the RPC write path to use the new `dml_payload::IngestOp` instead of the `DmlOperation` type. The implementation of `From<&WriteOperation> for DmlWrite` has been removed to complete the switch.	2023-06-21 11:13:44 +01:00
Fraser Savage	3e7a82f319	refactor(ingester): Remove `From` dml op to construct `IngestOp` directly Removes one of the temporary conversion traits and adds a test helper method `encode_batch(NamespaceId, WriteOperation)` for removal of the `DmlOperation` from WAL replay and the RPC write handler.	2023-06-21 11:11:19 +01:00
Fraser Savage	a3a4145774	refactor(ingester): Replace test_util `DmlWrite` with `WriteOperation` This change replaces the test_util equality and write generation code to use the new `IngestOp::Write(WriteOperation)` type, removing many pointless conversions in tests.	2023-06-21 11:11:18 +01:00
Fraser Savage	8908f5fc96	refactor(ingester): Take `IngestOp` as parameter for `DmlSink::apply` This commit switches over the trait definition to take `IngestOp` instead of `DmlOperation`. This commit is not enough to complete the switch, as the conversion from `DmlOperation` and back is a performance regression.	2023-06-21 11:11:17 +01:00
Fraser Savage	198a47da53	refactor(ingester): Implement `From<&WriteOperation>` for `DmlWrite` This commit implements the `From` trait to allow quick conversion from a `&WriteOperation` back to an owned `DmlWrite`. This conversion copies all the record batches and should be removed once WAL encoding can be done from `&WriteOperation`.	2023-06-21 11:11:16 +01:00
Fraser Savage	6527110663	refactor(ingester): Expose some `IngestOp` fields from inner type This allows accessing temporarily agnostic values without matching on the kind of `IngestOp`. There is only one type at the moment, so this just provides a bridge for users of the `DmlOperation` getters.	2023-06-21 11:11:15 +01:00
Fraser Savage	5ca7bd58f4	refactor(ingester): Implement `From<DmlOperation>` for `IngestOp` This commit implements the `From` trait to allow quick conversion from `DmlOperation::DmlWrite` to `IngestOp::WriteOperation`. This conversion performs some copies and should be removed once the RPC write path has been switched to use `IngestOp`.	2023-06-21 11:11:11 +01:00
wiedld	f632e4f023	chore(7899): add InstrumentedDiskProtection to the WAL * this requires providing the metrics registry to the wal	2023-06-16 20:42:24 -07:00
dependabot[bot]	98a2e852db	chore(deps): Bump uuid from 1.3.3 to 1.3.4 (#7985 ) Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.3.3 to 1.3.4. - [Release notes](https://github.com/uuid-rs/uuid/releases) - [Commits](https://github.com/uuid-rs/uuid/compare/1.3.3...1.3.4) --- updated-dependencies: - dependency-name: uuid dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-14 07:58:52 +00:00
wiedld	5dd9b50dd7	chore(7618): add metric to frame encoding instrumentation (#7901 ) * add FlightService metric to record a duration histogram across requests. * duration per partition, per request * make available to the FlightFrameEncodeRecorder * update naming conventions to reflect updated functionality	2023-06-13 11:29:29 -07:00
Fraser Savage	4909d4122f	refactor(ingester): Rename `data_types` module to `dml_payload`	2023-06-13 14:23:37 +01:00
Fraser Savage	e5719cffff	refactor(ingester): Add `data_types` module with `IngestOp` enumeration The `dml` crate and its contained types simultaneously contain more and less data than the ingester needs for writes. This type is to replace the use of `DmlOperation` and `DmlWrite` within the ingester's internals so that the type can be specialised with low blast-radius changes. The key change here is to remove the ties to the `DmlMeta` construction and allow sequencing of data on a per-partition basis	2023-06-13 11:54:51 +01:00
dependabot[bot]	2ffa9f3cda	chore(deps): Bump crossbeam-utils from 0.8.15 to 0.8.16 Bumps [crossbeam-utils](https://github.com/crossbeam-rs/crossbeam) from 0.8.15 to 0.8.16. - [Release notes](https://github.com/crossbeam-rs/crossbeam/releases) - [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md) - [Commits](https://github.com/crossbeam-rs/crossbeam/compare/crossbeam-utils-0.8.15...crossbeam-utils-0.8.16) --- updated-dependencies: - dependency-name: crossbeam-utils dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-06-13 02:00:14 +00:00
Fraser Savage	71e47b59ab	refactor(wal): Make more use of combinators for WAL segment reading logic	2023-06-12 12:27:20 +01:00
Fraser Savage	fa69994358	refactor(wal): Implement `Iterator` for ClosedSegmentFileReader The ClosedSegmentFileReader is pretty much an iterator anyways, this just enables using all the juicy combinators with it more easily.	2023-06-09 17:30:53 +01:00
kodiakhq[bot]	e7effc62b5	Merge branch 'main' into savage/sequence-per-partition	2023-06-08 14:28:44 +00:00
Fraser Savage	309310ac4c	refactor(ingester): Make line protocol in `wal_sink` test more readable Co-authored-by: Dom <dom@itsallbroken.com>	2023-06-08 15:13:59 +01:00
Carol (Nichols \|\| Goulding)	bf699a8b60	fix: Remove partition ID from the metadata serialized into Parquet files (#7947 ) Nothing gets the partition ID out of the metadata. The parts of the code interacting with object storage that need the ID to create the object store path were using the partition ID from the metadata out of convenience, but I changed those places to pass in the partition ID in a separate argument instead. This will make the transition to deterministic partition IDs a bit smoother. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:03:21 +00:00
Fraser Savage	fad34c375e	refactor(wal): Use TableId type for look-aside map key This adds a little extra layer of type safety and should be optimised by the compiler. This commit also makes sure the ingester's WAL sink tests assert the behaviour for partitioned sequence numbering on an operation that hits multiple tables & thus partitions.	2023-06-08 11:39:23 +01:00
Fraser Savage	6daec564d0	Merge branch 'main' into savage/sequence-per-partition	2023-06-08 10:24:50 +01:00
Andrew Lamb	17c0d837b3	chore: Update DataFusion, arrow, object_store pins (#7942 ) * chore: Update DataFusion, arrow, object_store pins * chore: Update for hakari * chore: Update for new APIs * fix: update test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-07 17:08:31 +00:00
dependabot[bot]	7b6efae62c	chore(deps): Bump tempfile from 3.5.0 to 3.6.0 Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.5.0 to 3.6.0. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](https://github.com/Stebalien/tempfile/compare/v3.5.0...v3.6.0) --- updated-dependencies: - dependency-name: tempfile dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-06-07 08:21:40 +00:00
Fraser Savage	7de98a6f11	refactor(wal): Associate sequence numbers to table ID in `SequencedWalOp`s Writes are partitioned before being placed in the buffer tree. This has the effect of splitting up the persistence of a DmlWrite's contents and thus the persistence of data referred to by write operations placed into a single WAL entry for a write op. This change associates the currently assigned sequence number with every `TableId` in the write, so that persist events for a single write can be tracked on a per table/partition level. Making this partial change enables a transition period where changes can be rolled back and WAL files can still be processed. A future change will produce a new sequence number per table ID.	2023-06-06 17:49:09 +01:00
dependabot[bot]	ee61e954bf	chore(deps): Bump flatbuffers from 23.1.21 to 23.5.26 (#7922 ) Bumps [flatbuffers](https://github.com/google/flatbuffers) from 23.1.21 to 23.5.26. - [Release notes](https://github.com/google/flatbuffers/releases) - [Changelog](https://github.com/google/flatbuffers/blob/master/CHANGELOG.md) - [Commits](https://github.com/google/flatbuffers/compare/v23.1.21...v23.5.26) --- updated-dependencies: - dependency-name: flatbuffers dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-05 09:41:08 +00:00
dependabot[bot]	d8b06c59c4	chore(deps): Bump once_cell from 1.17.2 to 1.18.0 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.17.2 to 1.18.0. - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.17.2...v1.18.0) --- updated-dependencies: - dependency-name: once_cell dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-06-05 02:03:15 +00:00
wiedld	2d2c3d5f8b	chore(idpe-17592): DeferredLoad metric counts (#7858 )	2023-06-02 10:56:39 -07:00
Marco Neumann	fa5011197c	refactor: migrate `iox_query` to use DataFusion statistics (#7908 ) This is the major part of #7470. Additional clean ups (e.g. to remove the actual types from `data_types`) will follow. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-02 09:18:59 +00:00
Andrew Lamb	a48f681e56	feat(parquet): reduce and limit buffering when writing parquet files (#7880 ) * feat: limit buffering when writing parquet files ("combined solution") * chore: Run cargo hakari tasks --------- Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-31 13:27:32 +00:00
Dom	8f6308fca3	Merge branch 'main' into dom/frame-docs	2023-05-29 09:59:27 +01:00
Andrew Lamb	1ff76b7bf2	chore: use workspace dependencies for `object_store`	2023-05-26 07:03:42 -04:00
Dom Dwyer	ed276bdc73	docs: reflink RecordBatch	2023-05-26 12:07:05 +02:00
Dom Dwyer	c4691b04e4	docs: describe what the spans capture A short description on the FlightFrameEncodeRecorder that helps people understand exactly what the spans cover - it's likely people will wind up looking at this code after debugging an issue in a trace, so lets make sure we give them as much helpful context as possible!	2023-05-26 11:46:45 +02:00
wiedld	7bcde3c544	chore(7618): trace ingester response encoding v2 (#7820 ) * test: integration test for tracing of queries to the ingester * chore: add FlightFrameEncodeRecorder to record spans per each polling result * refactor(trace): impl TraceCollector for Arc Allow any Arc-wrapped TraceCollector implementation to be used as a TraceCollector. This avoids needing to as_any() and downcast later. * test: assert FlightFrameEncodeRecorder trace spans This test exercises the FlightDataEncoder wrapped with the trace decorator (FlightFrameEncodeRecorder) when executing against a data source that yields data after varying numbers of Stream polls. This test passing will validate the FlightFrameEncodeRecorder correctly instruments the amount of time a client spends waiting on the FlightDataEncoder to acquire or encode a protocol frame, but also ensures the decorator correctly accounts for varying behaviours allowed through the Stream abstraction. It does this by simulating a data source that is not always immediately ready to provide data, such as a buffer wrapped in a contended async mutex. * refactor: move tracing decorator into separate mod * fix: record spans * refactor(test): update test The frame encoder is not one-to-one - it emits two frames for the first data payload, a schema and a payload. This commit updates the test to account for it! * refactor: remove unneeded mut ref, and use enum state method which panics when in a (should be unreachable) state * chore: add more docs to FlightFrameEncodeRecorder and related --------- Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-26 09:40:16 +00:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	604bab9508	fix: Make Table create_or_get be only create	2023-05-24 10:34:30 -04:00
dependabot[bot]	b7fbfa6fb2	chore(deps): Bump criterion from 0.4.0 to 0.5.0 (#7856 ) Bumps [criterion](https://github.com/bheisler/criterion.rs) from 0.4.0 to 0.5.0. - [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0) --- updated-dependencies: - dependency-name: criterion dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-05-24 09:08:37 +00:00
Marco Neumann	6729b5681a	fix(ingester): re-transmit schema over flight if it changes (#7812 ) * fix(ingester): re-transmit schema over flight if it changes Fixes https://github.com/influxdata/idpe/issues/17408 . So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME schema. When the ingester crafts a response for a specific partition, this is also almost always the case however when there's a persist job running (I think) it may have multiple snapshots for a partition. These snapshots may have different schemas (since the ingester only creates columns if the contain any data). Now the current implementation munches all these snapshots into a single stream, and hands them over to arrow flight which has a high-perf encode routine (i.e. it does not re-check every single schema) so it sends the schema once and then sends the data for every batch (the data only, schema data is NOT repeated). On the receiver side (= querier) we decode that data and get confused why on earth some batches have a different column count compared to the schema. For the OG ingester I carefully crafted the response to ensure that we do not run into this problem, but apparently a number of rewrites and refactors broke that. So here is the fix: - remove the stream that isn't really as stream (and cannot error) - for each partition go over the `RecordBatch`es and chunk them according to the schema (because this check is likely cheaper than re-transmitting the schema for every `RecordBatch`) - adjust a bunch of testing code to cope with this * refactor: nicify code * test: adjust test	2023-05-23 14:27:11 +00:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Marco Neumann	b2ff90de63	test: regression test for #7812 (#7851 ) Regression test that #7812 will fix.	2023-05-23 12:43:04 +00:00

1 2 3 4 5 ...

771 Commits (14815435b8e42852801395d2ec8edc2b29f3a77b)