influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	c9567cad7d	fix: Rename some more sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fbae4282df	fix: Rename another sequencer to shard to be hopefully clearer	2022-08-29 14:06:45 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Andrew Lamb	35f99fe940	fix: fix intermittent failures in `data::tests::persist` (#5437 ) * fix: fix intermittent failures in data::tests::persist * fix: tweak comments and message * fix: space	2022-08-19 21:16:00 +00:00
kodiakhq[bot]	2b3ca54168	Merge branch 'main' into cn/upgrade-l0-metrics	2022-08-17 16:01:42 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Carol (Nichols \|\| Goulding)	ed44817ed1	feat: Add a histogram of ingested (new L0) Parquet file sizes Connects to #5348.	2022-08-15 10:13:54 -04:00
Marco Neumann	6b8b922fe7	fix: do not loose data when Kafka reports that offset is above watermark (#5322 ) * fix: do not loose data when Kafka reports that offset is above watermark This can happen in certain cluster rebalance settings. This is also linked to https://github.com/influxdata/rskafka/issues/147 but for the upstream issue I currently have no idea how to fix it, so let's at least harden IOx against it. Fixes #5128. * refactor: panic for `SequenceNumberAfterWatermark`	2022-08-11 07:32:04 +00:00
Andrew Lamb	3a945dbcb2	chore: return a struct with named and documented fields from `compact_persisting_batch` (#5346 ) * chore: return a struct with named and documented fields from `compact_persisting_batch` * docs: Remove extra 'the' and fix a typo Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com>	2022-08-10 20:22:29 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Andrew Lamb	7219f512c3	fix: update sort key in catalog before adding parquet file to catalog (#5333 ) * fix: update sort key before parquet file * fix: Remove left over debugging * fix: fix bug, improve logging * chore: move debug log after catalog update, improve args and docs	2022-08-09 10:27:51 +00:00
Marco Neumann	9fbc95c3ad	feat: add sequencer reset count metric and log to ingester (#5286 ) Split out from #5253. Helps with #5128. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 13:00:36 +00:00
dependabot[bot]	94fe5b4c10	chore(deps): Bump paste from 1.0.7 to 1.0.8 (#5280 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.7 to 1.0.8. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.7...1.0.8) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 09:03:25 +00:00
dependabot[bot]	fbd39844d8	chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-01 08:30:33 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Marko Mikulicic	9da8062a16	fix: Fix typo in log message (#5222 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-27 15:34:37 +00:00
Marco Neumann	9a9a1a4777	feat: limit per-table chunk data for every query (#5223 ) * feat: `QueryChunk::as_any` * feat: allo `ChunkPruner::prune_chunks` to fail * feat: limit per-table chunk data for every query Closes #5211. * fix: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 13:20:05 +00:00
Andrew Lamb	fbf672015e	refactor: Reduce ceremony requried to create a `Span` from `SpanContext` (#5181 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 11:19:38 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
dependabot[bot]	278a7f91af	chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 10:00:08 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Andrew Lamb	5bebff0b06	Revert "feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set" (#5116 ) This reverts commit ca6875f60bec935eb6079b684d6eaa0cbc8a5306. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-15 13:22:45 +00:00
kodiakhq[bot]	18ffe581b5	Merge branch 'main' into dependabot/cargo/tokio-1.20.0	2022-07-14 14:18:51 +00:00
Marco Neumann	512f9850ee	refactor: ingester seek log debug => info (#5127 ) This message will be printed once per partition on ingester startup and shouldn't be too noisy, but is very helpful to judge "replay" / "catch-up".	2022-07-14 10:28:16 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Andrew Lamb	64b6b4fd6f	feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set (#5115 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-13 14:21:06 +00:00
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00
Marco Neumann	aacdeaca52	refactor: prep work for #5032 (#5060 ) * refactor: remove parquet chunk ID to `ChunkMeta` * refactor: return `Arc` from `QueryChunk::summary` This is similar to how we handle other chunk data like schemas. This allows a chunk to change/refine its "believe" over its own payload while it is passed around in the query stack. Helps w/ #5032.	2022-07-07 13:21:48 +00:00
Andrew Lamb	8f5210ea3e	test: add test for "duration since production" in kafka `write_buffer` implementation (#5043 ) * test: add test for timestamps in kafka write buffer * refactor: move timestamp batching test to generic tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 10:27:27 +00:00
Marco Neumann	16bd3e67c0	refactor: unify `apply_predicate_to_metadata` (#5030 ) Instead of using some hand-rolled timestamp-based logic (or just "unknown") all over the place, just use logic introduced in #5017. This requires slightly improved table summaries within the querier that at least has min/max for the timestamp column. For that, the former `IngesterChunk`-specific `calculate_summary` method was extended to `create_basic_summary` to include that data and is now also used by `QuerierParquetChunk`. Note: `QuerierRBChunk` already has detailled metrics that are provided by the read buffer implementation. Should we ever need even better pruning for `QuerierParquetChunk` (or `IngesterChunk`) then we _only_ need add extra data to the table summaries. Closes #4976. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:51:59 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Raphael Taylor-Davies	835e1c91c7	chore: update object_store to 0.3.0 (#4707 ) * chore: update object_store to 0.3.0 * chore: review feedback Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 21:44:03 +00:00
Markus Westerlind	edf3f08e81	refactor: Replace all uses of lazy_static with once_cell Went through and remove all lazy_static uses with once_cell (while waiting for the project to compile). There are still dependencies using lazy_static so it is still in the crate graph but at least there isn't an explicit dependency on it (and it is easier to update to `std::lazy::Lazy` once that is stable).	2022-06-29 16:22:02 +02:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Andrew Lamb	bfddb032ce	docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` (#4877 ) * docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` * docs: improve comments about LifecycleConfig::partition_size_threshold Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 21:52:40 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Nga Tran	3c0fb6e8ef	fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942 ) * fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 19:00:13 +00:00
Andrew Lamb	49b34e1135	test: add appropriate tests	2022-06-23 11:50:55 -04:00
Andrew Lamb	fb4c3ed294	fix: revert test change	2022-06-23 11:34:59 -04:00
Dom Dwyer	9a79d16585	fix: account for partition memory until persisted The ingester maintains a rough "total memory in use" counter it uses to try and limit the amount of memory the ingester is using overall. When a partition is persisted, this total memory usage value is adjusted to account for releasing the partition memory. Prior to this commit, the ordering was: * Writes increase the memory counter * maybe_persist() is called to trigger persistence * A partition is identified for persistence * Partition memory usage is released back to the total memory counter * Persistence starts This meant that the partitions in the process of being persisted were not accounted for in the ingester's total memory counter, and therefore we could significantly overrun the configured memory limit. After this commit, the ordering is: * Writes increase the memory counter * maybe_persist() is called to trigger persistence * A partition is identified for persistence * Persistence starts * Persistence completes * Partition memory usage is released back to the total memory counter This ensures persisting partitions are sill tracked in the total memory counter, causing pauses to correctly fire.	2022-06-23 15:40:51 +01:00
Dom Dwyer	87af3848d1	refactor: remove unused errors These errors are not referenced, but are hidden from the "unused" lint because of the macro magic code generation.	2022-06-23 11:24:30 +01:00
Andrew Lamb	16c558e11e	refactor: Make some structures in `LifecycleManager` non pub (#4929 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 09:55:39 +00:00
Dom Dwyer	75a3fd5e1e	refactor: use propagated partition key in ingester Changes the ingester to use the partition key derived in the router, and transmitted over through the kafka API boundary. This should have no observable behavioural change, but be more resilient as we're no longer assuming the partitioning algorithm produces the same value in both the router (where data is partitioned) and the ingester (where data is persisted, segregated by partition key). This is a pre-requisite to allowing the user to specify partitioning schemes.	2022-06-21 15:57:30 +01:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Andrew Lamb	f151b1e89f	fix: categorize `NamespaceNotFound` as ingester not found errors as well (#4899 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-20 08:40:31 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Marco Neumann	743c1692ea	refactor: stream query results from ingester to querier (#4875 ) * refactor: stream partitions from ingester Ref #4849. * refactor: do not collect record batched on the ingester side Ref #4849. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 12:58:50 +00:00
Andrew Lamb	d67336fd69	fix(ingester): ensure all ingester metrics are prefixed with `ingester_` (#4871 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 12:52:35 +00:00
Andrew Lamb	74f4006580	fix(ingester): make ingester metrics start with `ingester` (#4870 ) * fix(ingester): make ingester metrics start with `ingester` * fix: Update ingester/src/stream_handler/handler.rs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 12:46:37 +00:00
Andrew Lamb	8c56909218	fix(ingester): Distinguish between "not found" and other flight errors (#4874 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 12:39:37 +00:00
Marco Neumann	4b945493be	test: test gRPC and stream flattening (#4873 ) Ref #4849. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 11:44:59 +00:00
Marco Neumann	66c7d95312	refactor: use new ingester<>querier wire protocol (#4867 ) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-06-16 08:02:28 +00:00
Andrew Lamb	6b771375bf	feat: log when partitions are written due to going over size (#4868 )	2022-06-15 20:12:43 +00:00
Dom Dwyer	4df2964566	refactor: store PartitionKey in DmlWrite Carry the PartitionKey in the DmlWrite, allowing the batch to be associated with a specific partition key.	2022-06-15 15:48:54 +01:00
Marco Neumann	7c60edd38c	refactor: prepare new ingester<>querier protocol on the querier side (#4863 ) * refactor: prepare new ingester<>querier protocol on the querier side This changes the querier internals to work with the new protocol. The wire protocol stays the same (for now). There's a (somewhat hackish) adapter in place on the querier side that converts the old to the new protocol on-the-fly. This is an intermediate step before we actually change the wire protocol (and in a step after that also take advantage of the new possibilites on the ingester side). Ref #4849. * docs: explain adapter	2022-06-15 14:32:24 +00:00
Andrew Lamb	005610b172	refactor: remove some `&` use in iox_catalog (#4862 ) * refactor: remove some `&` use in iox_catalog * fix: Update data_types/src/lib.rs	2022-06-15 11:31:49 +00:00
Nga Tran	b682dbbc2e	chore: Add debug info of sort_key for ingester (#4859 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 20:39:17 +00:00
Andrew Lamb	c8f70b8933	feat: log query from querier to ingester at `info` level (#4856 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 18:35:50 +00:00
Andrew Lamb	eca3b6b9a1	fix: reduce memory usage in ingester with less buffering prior to query engine (#4830 ) * refactor: remove another buffer copy in ingester * docs: Update arrow_util/src/util.rs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 18:22:55 +00:00
Andrew Lamb	7d2a5c299f	refactor: remove one buffer copy in the ingester (#4855 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 17:15:36 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
Andrew Lamb	34e8659876	refactor: consolidate plan creation from `QueryChunk`s in `iox_query` (#4837 ) * refactor: consolidate plan creation from Chunks * docs: update docstrings Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 14:36:07 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
Andrew Lamb	9fdbfb05e7	refactor: Use scan_and_filter in ReorgPlanner (#4822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-10 17:31:25 +00:00
kodiakhq[bot]	dd8d44e24f	Merge branch 'main' into cn/duration	2022-06-10 14:23:09 +00:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00
Andrew Lamb	dc992209be	test: account for active writes when reporting readable status (#4782 ) * test: account for active writes when reporting readable status * fix: logical merge conflict	2022-06-10 12:59:09 +00:00
Andrew Lamb	11cec18edc	refactor: Move `scan_and_filter` into a `common` module for reuse (#4823 ) * refactor: remove unused error variants * refactor: move scan_and_filter into a module so it can be reused * docs: update comments about pruning	2022-06-10 11:15:47 +00:00
Andrew Lamb	50697906b1	refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817 )	2022-06-09 19:36:37 +00:00
Carol (Nichols \|\| Goulding)	1c7cbaf5ae	refactor: Use DurationHistogram in more places	2022-06-09 14:20:51 -04:00
Andrew Lamb	2ec7764fdd	refactor: rename builder like predicate methods to be `with_` (#4808 ) * refactor: rename builder like predicate methods to be `with_` * fix: merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-09 11:26:03 +00:00
Andrew Lamb	d8331e8679	fix: do not return 'readable' until a write is completely readable (#4778 ) * fix: do not return readable until a write is completely readable * docs: Add diagram with partially buffered write * refactor: account for actively buffering during update rather than fixup * fix: fixup * fix: use checked_sub Co-authored-by: Marco Neumann <marco@crepererum.net> * fix: checked_sub calculation Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-06-09 11:15:15 +00:00
Andrew Lamb	f34282be2c	fix: Do not run DataFusion optimizer pass twice (#4809 ) * fix: Do not run DataFusion optimizer pass twice * docs: improve docstring and logging	2022-06-08 21:01:22 +00:00
Andrew Lamb	afc1c12062	refactor: consolidate `PredicateBuilder` into `Predicate` (#4799 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-08 12:21:24 +00:00
Dom Dwyer	1fc5596023	perf: streaming compaction in ingester Reduces memory usage in the ingester during persist operations by streaming the results of the snapshot merge/sort/dedupe directly to the parquet file. Prior to this commit the output of the compact was buffered in memory before being wrote to the parquet file.	2022-06-07 12:01:26 +01:00
dependabot[bot]	04c685b3b7	chore(deps): Bump tokio-util from 0.7.2 to 0.7.3 (#4784 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.2...tokio-util-0.7.3) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-06 14:46:27 +00:00
dependabot[bot]	a1ea793e13	chore(deps): Bump tokio-stream from 0.1.8 to 0.1.9 (#4785 ) Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.8 to 0.1.9. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.8...tokio-stream-0.1.9) --- updated-dependencies: - dependency-name: tokio-stream dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:21:54 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
dependabot[bot]	9a21292db8	chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-03 09:10:40 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00
Marco Neumann	c91dbe062e	test: "optimize" ingesterrecord batches in query tests (#4700 ) * test: "optimize" ingesterrecord batches in query tests It seems that I had the right idea in #4656 but wasn't able to trigger https://github.com/influxdata/conductor/issues/955 because the query tests do not "optimize" the record batches in the same way the actual gRPC implementation does. If we apply the same transformation we indeed end up with the same error. * fix: all batches within the ingester flight response must have same schema * refactor: simplify and reuse code Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 07:37:11 +00:00
Paul Dix	6af32b7750	feat: add concurrency limit for ingester queries (#4703 ) I've defaulted it to 20, we can adjust as needed. Closes #4657 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-30 10:22:17 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Marco Neumann	31d1b37d73	refactor: de-duplicate low-level arrow code (#4697 ) It seems that during prototyping NG we've copied low level code (w/o tests!) and never cleaned up. Let's not have this functionality twice.	2022-05-25 16:24:28 +00:00
Carol (Nichols \|\| Goulding)	6ce6a38094	fix: Make metric names potentially less confusing	2022-05-25 10:04:39 -04:00
Dom	9cd1286051	Merge branch 'main' into dom/meta-remove-row-count	2022-05-23 16:39:38 +01:00
Marco Neumann	2029bd16ba	feat: enable debugging of failed querier->ingester requests (#4659 ) * feat: enable debugging of failed querier->ingester requests - extend `query-ingester` CLI to allow usage of predicates - on failed requests: log all information that required for the CLI - test the "ingester fails" scenario * test: explain Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: move b64 pred. serde into a single crate Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-23 15:37:31 +00:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom	f0d0f1ba0c	Merge branch 'main' into dom/codec-object-store	2022-05-23 15:39:54 +01:00
kodiakhq[bot]	a06746c715	Merge branch 'main' into cn/last-available	2022-05-23 13:08:19 +00:00
Carol (Nichols \|\| Goulding)	05bd9de4d3	test: Add a test for the sequence number skipping metric Ok, so... this needed lots of... channels. Channels everywhere. The stream method on TestWriteBufferStreamHandler previously assumed it would only be called once. In a test where reset_to_earliest is called, stream might be called again to get the reset stream. We want to be able to control which of the streams gets which operations, so that's why the macro now takes a vec of vec of operations-- one vec of operations per expected call to stream, and the stream will send all the operations in its vec. The test thread needs to wait for the handler stream to consume the last item from the last receiver stream, so when the TestWriteBufferStreamHandler has set up the last expected call to stream, pass back the last transmitter and have it wait until it's at full expected capacity (which means all operations have been consumed by the receiver).	2022-05-20 20:50:02 -04:00
Carol (Nichols \|\| Goulding)	bda231051a	feat: Record metrics when resetting the write buffer and skipping sequence numbers	2022-05-20 20:48:17 -04:00
Carol (Nichols \|\| Goulding)	bcbf7b4f46	refactor: Move error handling logic to be all together	2022-05-20 20:48:17 -04:00

1 2 3 4 5 ...

376 Commits (354d0f1e0e44388d31901450e78b477fb3c1a60f)