influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Dom	887d73f7e1	Merge pull request #5510 from influxdata/dom/empty-parquet fix: remove empty parquet panic	2022-08-30 14:20:20 +01:00
Dom Dwyer	2fc0ddbea1	fix: compactor tolerates empty output Changes the compactor code to tolerate a SplitExec yielding an empty partition (with no rows). This raises a WARN as the situation in which this is acceptable is very rare, and is more likely indicative of an opportunity to improve the SplitExec usage (i.e. pruning out unnecessary split points).	2022-08-30 14:52:31 +02:00
Dom Dwyer	7698264768	refactor: raise error for no rows in parquet file Previously when attempting to serialise a stream of one or more RecordBatch containing no rows (resulting in an empty file), the parquet serialisation code would panic. This changes the code path to raise an error instead, to support the compactor making multiple splits at once, which may overlap a single chunk: ────────────── Time ────────────▶ │ │ ┌█████──────────────────────█████┐ │█████ │ Chunk 1 │ █████│ └█████──────────────────────█████┘ │ │ │ │ Split T1 Split T2 In the example above, the chunk has an unusual distribution of write timestamps over the time range it covers, with all data having a timestamp before T1, or after T2. When a running a SplitExec to slice this chunk at T1 and T2, the middle of the resulting 3 subsets will contain no rows. Because we store only the min/max timestamps in the chunk statistics, it is unfortunately impossible to prune one of these split points from the plan ahead of time.	2022-08-30 14:52:31 +02:00
Raphael Taylor-Davies	711ba77341	chore: update object_store to test IMDSv1 fallback (#5509 ) * chore: update object_store to test IMDSv1 fallback * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 12:31:49 +00:00
Marco Neumann	fecbbd9fa1	refactor: improve namespace caching in querier (#5492 ) 1. Cache converted schema instead of catalog schema. This safes a buch of memcopies during conversion. 2. Simplify creation of new chunks, we now only need a `CachedTable` instead of a namespace and a table schema. In an artificial benchmark, this removed around 10ms from the query (although that was prior to #5467 which moved schema conversion one level up). Still I think it is the cleaner cache design. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 11:42:21 +00:00
Marco Neumann	430536f05f	refactor: use a single timestamp in policy backend (#5508 ) * refactor: use a single timestamp in policy backend Prior to this PR we had at least 1 `TimeProvider::now` calls per GET request (for caches that only used LRU) and up to 3 calls (caches with LRU + refresh + TTL). Let's instead use a single timestamp that is created by the policy backend itself (instead of the policies). This has the following consequences: - efficiency: `SystemProvider::now` is not free, even though under Linux this doesn't result in a syscall, it uses the stdlib time system which also checks for monotonicity - consistency: All changes for a single trigger (e.g. a GET cache call) now use a single timestamp instead of slightly increasing ones. I argue this is the better semantic, simpler to understand and better to debug. For some (slightly artificial) local performance experiment, this shaves off around 2ms per single-table SQL query. However I expect that there might be more degenerated cases (e.g. multi-table SQL queries or some InfluxRPC requests that hit multiple tables). The majority of this patch is moving the `TimeProvider` from the policies into the policy backend. * docs: explain `now` parameter	2022-08-30 11:23:25 +00:00
kodiakhq[bot]	bf0a0ab3a5	Merge pull request #5505 from influxdata/dom/revert-object-store-bump revert: object store bump	2022-08-30 08:56:10 +00:00
Dom	89af2f2b1d	Merge branch 'main' into dom/revert-object-store-bump	2022-08-30 09:47:02 +01:00
Dom	91167428f2	Merge pull request #5504 from influxdata/dom/dotenvy build: bump dotenvy	2022-08-30 09:46:00 +01:00
Dom Dwyer	66f0b59dbb	revert: remove Azure SDK / bump object_store This reverts commit `c2f8efa03a`.	2022-08-30 10:41:29 +02:00
Dom Dwyer	e752a707f8	revert: remove audit ignore for RUSTSEC-2022-0048 This reverts commit `227149e5b6`.	2022-08-30 10:39:55 +02:00
Dom Dwyer	dcc0f9d34f	build: bump dotenvy I fixed this while waiting for my build to deploy. I think that says more about our build than anything else!	2022-08-30 10:34:26 +02:00
Dom	5530d02adb	Merge pull request #5500 from influxdata/dependabot/cargo/futures-0.3.24 chore(deps): Bump futures from 0.3.23 to 0.3.24	2022-08-30 09:20:20 +01:00
Dom	747f5440e1	Merge pull request #5496 from influxdata/dependabot/cargo/futures-channel-0.3.24 chore(deps): Bump futures-channel from 0.3.23 to 0.3.24	2022-08-30 09:20:12 +01:00
Dom	b3a7602b47	Merge pull request #5503 from influxdata/dependabot/cargo/futures-core-0.3.24 chore(deps): Bump futures-core from 0.3.23 to 0.3.24	2022-08-30 09:19:07 +01:00
dependabot[bot]	852f6c5749	chore(deps): Bump futures-core from 0.3.23 to 0.3.24 Bumps [futures-core](https://github.com/rust-lang/futures-rs) from 0.3.23 to 0.3.24. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.23...0.3.24) --- updated-dependencies: - dependency-name: futures-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-30 01:25:21 +00:00
dependabot[bot]	0137db9adc	chore(deps): Bump futures from 0.3.23 to 0.3.24 Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.23 to 0.3.24. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.23...0.3.24) --- updated-dependencies: - dependency-name: futures dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-30 01:24:21 +00:00
dependabot[bot]	480bcbda18	chore(deps): Bump futures-channel from 0.3.23 to 0.3.24 Bumps [futures-channel](https://github.com/rust-lang/futures-rs) from 0.3.23 to 0.3.24. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.23...0.3.24) --- updated-dependencies: - dependency-name: futures-channel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-30 01:23:16 +00:00
kodiakhq[bot]	00aa4b9c83	Merge pull request #5470 from influxdata/cn/kafka-topic feat: Renaming kafka topic types	2022-08-29 20:53:04 +00:00
kodiakhq[bot]	419efb91e9	Merge branch 'main' into cn/kafka-topic	2022-08-29 20:46:33 +00:00
Andrew Lamb	de47f5605b	chore: Update datafusion (with new sqlparser release) - option 1 (#5433 ) * chore: Update datafusion pin * chore: Update now that user is a reserved word * chore: Update cargo.lock * fix: update query for user function Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 19:10:00 +00:00
Carol (Nichols \|\| Goulding)	dbd27f648f	refactor: Rename more mentions of Kafka to their other name where appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	1b49ad25f7	refactor: Rename KafkaTopicId to TopicId	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
kodiakhq[bot]	122dbe1b4b	Merge pull request #5435 from influxdata/cn+jpg/shard feat: renaming some of the confusing sequencer things	2022-08-29 18:16:15 +00:00
Carol (Nichols \|\| Goulding)	cb52683a1a	fix: Redo uses after rebase	2022-08-29 14:08:33 -04:00
Carol (Nichols \|\| Goulding)	3aa3ae2ba5	docs: Add more comments about why to use ShardIndex or ShardId	2022-08-29 14:07:20 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	c9567cad7d	fix: Rename some more sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	ab20828c2f	fix: Rename some more comments and test values from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	6443858870	fix: Rename compactor option from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	95b7529079	fix: Rename more test values to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fbae4282df	fix: Rename another sequencer to shard to be hopefully clearer	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	f6c93f7e67	fix: Remove moot comment	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	952a3ea498	fix: Return querier sharding to use sequencer ID	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	240946d8f5	fix: Deprecate proto sequencer_id fields; add shard_id fields	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Sam Arnold	05657ea068	fix: optimizations for metadata fetch and chunk pruning (#5467 ) * fix: hoist repeated computation out of chunk creation We have hundreds of chunks per table, so it is beneficial to only do common work once. * chore: remove TableCache as it is no longer used * fix: prune chunks both before and after metadata fetch Fetching the metadata for all the chunks in a table is expensive, especially when we have a narrow time range query that only needs a few chunks. * chore: fix clippy * fix: fix up some last tests * fix: review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 14:59:05 +00:00
Marco Neumann	e441b5b307	feat: add deadline config to backoff system (#5489 ) This will simplify event emission in #5464. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 14:51:41 +00:00
kodiakhq[bot]	4f119d1e40	Merge pull request #5485 from influxdata/dom/kafka-msg-size-dist feat: Kafka payload size distribution metric	2022-08-29 13:55:44 +00:00
kodiakhq[bot]	339b1e8b92	Merge branch 'main' into dom/kafka-msg-size-dist	2022-08-29 13:49:18 +00:00
Andrew Lamb	9aac78d30b	fix: Correctly lexigraphically sort `_field` and `_measurement` with upper case tag keys (#5436 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 13:45:03 +00:00
Adrian Thurston	33e31725c9	feat: added rustup toolchain dir to docker buildkit cache (#5474 ) Added /usr/local/rustup to the list of directories cached during build. This is where rustup installs the toolchain, so we save the download and install on every build. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 12:32:20 +00:00
Dom	247841dbf1	Merge branch 'main' into dom/kafka-msg-size-dist	2022-08-29 13:27:38 +01:00
Marco Neumann	8bc7606cb5	refactor: provide process-wide static strings (version, UUID) (#5487 ) We currently only use the human-readable version string for the CLI help, but for #5464 I want to use the GIT hash and a process-time UUID. This is the prep work for that. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 12:25:19 +00:00
kodiakhq[bot]	0ddc21ef40	Merge pull request #5488 from influxdata/dom/fix-audit build: bump object_store	2022-08-29 12:18:04 +00:00
Dom Dwyer	175cae2f56	feat: capture Kafka message size distribution Adds instrumentation to the low-level (post-aggregation) Kafka client, capturing the uncompressed, approximate message size (calculated as the sum of all Record::approximate_size() returns, ignoring largely static framing overhead).	2022-08-29 14:08:51 +02:00

1 2 3 4 5 ...

8861 Commits (cb10a7c6d8d7d0a88d1e90848ac62eea9cdcbf90) All Branches Search

8861 Commits (cb10a7c6d8d7d0a88d1e90848ac62eea9cdcbf90)

All Branches