influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	02893e598c	chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516 ) * chore: Tool for automating arrow version update * chore: Update datafusion and arrow/parquet/arrow-flight * fix: update for changes in Arrow API Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-05 00:21:02 +00:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
Marco Neumann	59f6556483	fix: do not create empty batches in ingester (#4443 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-27 17:52:22 +00:00
Nga Tran	fa2c1febf4	feat: use stored partition sort key to deduplicate data (#4360 ) * feat: use stored sort key to deduplicate data * refactor: verify if one is a super sort key of the other * test: unit tests for scan and deduplication plans * fix: typo * refactor: refactor and add comments * feat: cache partition sort key to read during planning as needed * test: tests for query plans with different overlap groups * chore: cleanup * chore: resolve merge conflicts Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 20:36:32 +00:00
Marco Neumann	11f87cffdd	fix: memorize max persisted tombstone (#4430 )	2022-04-26 16:13:09 +00:00
Marco Neumann	bd600bbac6	refactor: allow ingester to be integrated into query tests (#4427 ) * refactor: improve `IngesterData` public interface * feat: impl `Debug` for `Test{Namespace,Sequencer}` * refactor: trait interface for `LifecyleHandle` This is required to mock the lifecycle for query tests. * refactor: trait for partitioner	2022-04-26 13:44:30 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Marco Neumann	86e8f05ed1	fix: make all catalog IDs 64bit (#4418 ) Closes #4365. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 16:49:34 +00:00
Nga Tran	d963110842	feat: group chunk overlaps based on time range only (#4389 ) * feat: overlap for NG querier * chore: cleanup * refactor: address review comments * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 13:32:07 +00:00
Marco Neumann	f444e63960	test: include materialized delete predicates in NG query tests (#4371 ) * refactor: move `batch_filter` to `datafusion_util` * fix: outdated docstring * feat: allow passing record batches to `iox_tests` parquet files * test: include materialized delete predicates in NG query tests * docs: improve wording Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-21 13:00:13 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Andrew Lamb	5ea676d3f7	feat: add per kafka partition durability reporting to write info response (#4341 ) * feat: add per kafka partition durability reporting to write info response * fix: buf lint + test cleanup * fix: clean up protobuf * refactor: pull out conversion of KafkaPartitionStatus into a function * fix: fmt * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-19 16:46:20 +00:00
Marco Neumann	5b48675435	fix: actually transmit record-batch metadata from querier (#4347 ) Attaching the "batch => partition" mapping via per-batch schema KV metadata does NOT work because flight will transmit the schema once for all batches (even though on the Rust side we have a schema ref attached to every batch, probably for convenience). Instead we now use the same global protobuf metadata that we also use for the "partition => max sequence number" information. This somewhat limits our ability to create record batches lazily on the ingester side (since the global metadata is sent before any actual payload) but I think we should not modify the usage of the flight protocol too much right now (e.g. by sending more schema messages). If this becomes an issue, we can always find a more complex solution in the future.	2022-04-19 10:54:23 +00:00
Nga Tran	2a601c3099	fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 )" (#4328 ) * fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)" This reverts commit `7e5d719027`. * chore: resolve merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-18 15:27:39 +00:00
Nga Tran	7e5d719027	chore: Revert "fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 ) This reverts commit `fe8d9948d5`.	2022-04-14 17:11:55 +00:00
Carol (Nichols \|\| Goulding)	fe8d9948d5	fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 ) This reverts commit `7ddbf7c025`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 15:42:28 +00:00
Marco Neumann	351b0d0c15	fix: unknown namespace/table in querier<>ingester flight protocol (#4307 ) * fix: return "not found" gRPC error instead of "internal" when ingester does not know table * fix: properly handle "namespace not found" in ingester queries * fix: make `initialize_db` work with async code * test: add custom step for NG tests * fix: handle "unknown table/namespace" resp. in querier * docs: explain test setup Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-14 12:36:15 +00:00
Marco Neumann	8bf2fbb7d3	fix: ingester min-unpersisted-sequence-number calc + doc (#4302 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 07:05:06 +00:00
Carol (Nichols \|\| Goulding)	7ddbf7c025	fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-13 14:11:10 +00:00
kodiakhq[bot]	21f748062e	Merge branch 'main' into cn/sort-in-compactor	2022-04-13 12:43:31 +00:00
Marco Neumann	83f77712b1	refactor: querier<>ingester flight protocol adjustments (#4286 ) * refactor: querier<>ingester flight protocol adjustments This makes a few adjustments to the querier<>ingester flight protocol. Query Scope =========== The querier will request data for ALL sequencer IDs for now. There is no reason to have a request per sequencer ID. We can add a range/set filter later if we want, but this is not required for now. Partition-level =============== The only time when the querier cares about sequencer IDs (i.e. sharding) at all is when it selects which ingesters to ask for unpersisted data (this is currently not implemented, it just asks all ingesters). Afterwards the querier only cares about partitions (which are bound to specific sequencers anyways) because this is the level where parquet file persistence and compaction as well as deduplication happen. So we make partitions a first-class citizen in the ingester response. Metadata VS RecordBatches ========================= The global app-metadata will list all partitions and their max persisted parquet files and tombstones (theoretically tombstones are at table-level, but the ingester could in the future break them down to the partition-level). Then it receives a stream of record batches. Each record batch is tagged (via key-value metadata in its schema) so it can be assigned to a partition. At the moment the ingester returns 0 or 1 batches per unpersisted partition (0 in case we've filtered out all the data via the predicate), but in the future it is free to return multiple batches. This setup gives the ingester more freedom over memory management and (potentially parallel) query processing, while at the same time keeps the set of duplicated information minimal and allows easy extensions (since the global metadata is a full-blown protobuf message). Querier ======= At the moment the querier ignores all the metdata. Follow-up PRs will change that. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: make code clearer Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-12 16:48:40 +00:00
Carol (Nichols \|\| Goulding)	87e4a1a51d	refactor: Move ingester sort key to schema sort key This logic isn't actually ingester specific	2022-04-11 14:09:45 -04:00
Carol (Nichols \|\| Goulding)	a053077a05	refactor: Make compute_sort_key more general than the ingester Enable computing sort keys for a schema and an iterator of record batches.	2022-04-11 14:09:45 -04:00
Dom Dwyer	5c3cbb14b4	test: join ingester background tasks	2022-04-08 14:24:56 +01:00
Dom Dwyer	dce939c580	refactor: use SequencedStreamHandler Removes the old stream_in_sequenced_entries() write buffer handler, replacing it with the SequencedStreamHandler introduced in #4203. This change will affect the metrics emitted by an ingester as outlined in #4243.	2022-04-08 11:28:39 +01:00
Dom Dwyer	71a278ac7e	refactor: accept !Sync write buffer streams Removes the Sync bound SequencedStreamHandler input stream type, as the BoxStream returned by the WriteBufferStreamHandler is not Sync. This change means the SequencedStreamHandler is not Sync either, but is still Send and therefore can be moved into tokio tasks.	2022-04-08 11:28:39 +01:00
Dom Dwyer	c2236fa3fb	feat: impl DmlSink for IngesterData This commit adds an adaptor (IngestSinkAdaptor) that provides a DmlSink implementation for the existing write path (IngesterData). With this, the existing write path becomes compatible with the new op stream handler (SequencedStreamHandler).	2022-04-08 11:28:39 +01:00
Andrew Lamb	a30a85e62c	feat: Add get_write_info service (#4227 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-07 19:24:58 +00:00
kodiakhq[bot]	8bd0bfb669	Merge branch 'main' into dom/ingester-op-instrumentation	2022-04-07 16:33:25 +00:00
kodiakhq[bot]	f5996c5ab4	Merge branch 'main' into cn/sort-key-across-persists	2022-04-07 14:40:55 +00:00
Dom	998a66fd98	docs: Update ingester/src/stream_handler/sink_instrumentation.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-07 12:18:14 +01:00
Carol (Nichols \|\| Goulding)	30c3ef5aa6	fix: Only save relevant columns in parquet file's sort key	2022-04-06 14:09:08 -04:00
Dom Dwyer	24eeddce8a	chore: fix lint warnings	2022-04-06 16:45:31 +01:00
Dom Dwyer	091640bb23	feat: emit tracing span for op apply This commit uses the tracing metadata within the DmlOperation to emit a tracing span from the ingester covering the DmlSink::apply() operation.	2022-04-06 16:32:00 +01:00
Dom Dwyer	f6c65f52a3	refactor: impl WatermarkFetcher Implement WatermarkFetcher for PeriodicWatermarkFetcher and remove unnecessary async.	2022-04-06 16:32:00 +01:00
Dom Dwyer	436da19d9a	feat: DmlSink instrumentation This commit adds the SinkInstrumentation type that decorates an inner DmlSink with call latency and write buffer metrics. The write buffer / sink call metrics may be split apart into two separate responsibilities in the future if there are multiple DmlSink that need instrumentation, but deferring adding more types until it is needed.	2022-04-06 16:32:00 +01:00
Andrew Lamb	c244b03281	feat: Add `SequencerProgress` reporting to ingester (#4238 ) * feat: Add `SequencerProgress` reporting to ingester * refactor: Use KafkaPartition in write_summary * fix: Update docstrings * refactor: Change ingester to use KafkaPartition everywhere * refactor: add SequencerProgress::combine * refactor: return new SequencerProgress rather than updating * fix: distinguish between yes/no/unknown in WriteSummary * docs: Update data_types2/src/lib.rs Co-authored-by: Paul Dix <paul@pauldix.net> Co-authored-by: Paul Dix <paul@pauldix.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-06 15:13:21 +00:00
Carol (Nichols \|\| Goulding)	bf3cb45723	refactor: Pass PartitionInfo as argument	2022-04-06 09:31:42 -04:00
Carol (Nichols \|\| Goulding)	f0d5987317	feat: Update partition sort_key in catalog after persist Connects to #4196.	2022-04-06 09:31:42 -04:00
Carol (Nichols \|\| Goulding)	b16fcc284d	feat: Add new columns to the sort key during compaction Connects to #4196.	2022-04-06 09:31:42 -04:00
Carol (Nichols \|\| Goulding)	98d052dba7	feat: Use catalog sort key if specified Pass the sort key from the catalog through to compact_persisting_batch. If the sort key is Some, use that. If the sort key is None, compute it from the data's cardinality with compute_sort_key. Connects to #4196.	2022-04-06 09:31:42 -04:00
Dom Dwyer	891d2e1368	feat: periodic kafka max watermark offset fetcher Adds the PeriodicWatermarkFetcher type responsible for querying write buffer / Kafka for the maximum sequence number / offset, surfacing any errors via both logs & metrics. This high watermark / max offset value is used within the ingest instrumentation metrics. This use case is tolerant of caching / stale values, and as such the value is periodically updated to minimise load on the write buffer.	2022-04-05 12:02:07 +01:00
Dom Dwyer	aaa677dec8	docs: describe graceful shutdown behaviour	2022-04-05 11:31:55 +01:00
Dom Dwyer	8edefc415d	refactor: rename ttbr -> write_time in tests	2022-04-05 11:31:55 +01:00
Dom Dwyer	a387ec361d	refactor: use self.deref() instead of **self	2022-04-05 11:31:55 +01:00
Dom Dwyer	f15275cf96	feat: expose ingest sequencer errors Instruments the SequencedStreamHandler with a series of new metrics that record the various error classes observable in the stream handler. These metrics are labelled with potential_data_loss=true where relevant to surface potential data loss events for alerting & further review.	2022-04-05 11:31:55 +01:00
Dom Dwyer	083ff1f8e3	refactor: ingest stream handler Refactors the stream_in_sequenced_entries() into a new impl in the SequencedStreamHandler type, decoupling the reading / decoding of ops from Kafka (and associated error handling) from the "what happens to those ops" concern to ease testing, encapsulate the specifics of "how to get an op" and improve flexibility. This is intended to provide robust error handling within what is reasonably possible (unexpected errors are always unexpected!) while retaining the existing metrics and functionality. I've also separated out code that exists in the current impl specifically to drive tests from the prod code path, instead driving those behaviours through mocks. As of this commit, the handler is not used - this commit simply adds the new impl.	2022-04-05 11:31:54 +01:00
Paul Dix	81d41f81a1	fix: ingester replay logic (#4212 ) Fix the ingester to track the max persisted sequence number per partition. Ensure replay takes in data from unpersisted partitions. Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.	2022-04-04 18:04:34 +00:00
Carol (Nichols \|\| Goulding)	d41adf074f	test: Add assertions for sort keys	2022-04-01 13:13:04 -04:00
Carol (Nichols \|\| Goulding)	f4b5fa1b5e	feat: Implement distinct counts in terms of distinct values For one record batch. Connects to #4194.	2022-03-31 16:46:27 -04:00

1 2 3 4

192 Commits (37c7ce793cc12a7f5dc29b3bc3d575c6c500f2bf)