Commit Graph

8455 Commits (de74415cbed9f11182500c2ad878f1df2376cec2)

Author SHA1 Message Date
Carol (Nichols || Goulding) de74415cbe
feat: Gather parquet files for a partition compaction operation
Fixes #5118.

Given a partition ID, look up the non-deleted Parquet files for that
partition. Separate them into level 0 and level 1, and sort the level 0
files by max sequence number.

This is not called anywhere yet.
2022-07-13 16:53:21 -04:00
kodiakhq[bot] 45cd4eb504
Merge pull request #5122 from influxdata/cn/back-to-2-levels
fix: Remove unused level 1 compaction; move level 2 to level 1
2022-07-13 19:58:33 +00:00
Carol (Nichols || Goulding) d19c468b9d
fix: Remove unused level 1 compaction; move level 2 to level 1
Fixes #5119.
2022-07-13 15:05:09 -04:00
kodiakhq[bot] a8c5bd7ac9
Merge pull request #5093 from influxdata/cn/compactor-metrics
feat: Record metric for number of files in a compaction by compaction level
2022-07-13 15:54:43 +00:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Carol (Nichols || Goulding) 34fcf6a584
fix: Line wrap to 100 columns 2022-07-13 11:29:13 -04:00
Marco Neumann 89c24dfec0
fix: do not force-load chunks into read buffer (#5112)
I forgot to address a TODO in #5091. Extends to test to actually check
the chunk stage and removes the function for manual force-loads.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 14:46:24 +00:00
Marco Neumann b42dd1c45e
chore: update all dependencies (esp. `time`) (#5114)
Our CI found a potential panic:
https://app.circleci.com/pipelines/github/influxdata/influxdb_iox/21307/workflows/7c1df458-3c6a-4f5d-bfda-51612273bbc7/jobs/183140

See https://github.com/time-rs/time/issues/481 , so let's update
`time` and since we've accumulated quite a backlog, just run
`cargo update --workspace`.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 14:30:18 +00:00
Andrew Lamb 64b6b4fd6f
feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set (#5115)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 14:21:06 +00:00
Nga Tran 5c5c964dfe
feat: config params for Compactor (#5108)
* feat: config params for Compactor

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 13:50:07 +00:00
Marco Neumann 9e09f77a45
fix: fix overeager Kafka message flushing (#5113)
* test: add (failing) test to ensure that interleaved partition writes are aggregated correctly

* fix: fix overeager Kafka message flushing
2022-07-13 12:32:03 +00:00
Marco Neumann b1b2cb5d4a
feat: load read buffer on demand (#5091)
* refactor: extract `select_schema`

* refactor: improve `InternalLostInputField` error message

* test: improve SQL runner output

* feat: load read buffer on demand

Closes #5032.

* refactor: move `[Half]OwnedSelection` to `schema` crate`
2022-07-13 08:51:40 +00:00
dependabot[bot] 3a1934e7af
chore(deps): Bump clap from 3.2.8 to 3.2.10 (#5111)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.8 to 3.2.10.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.8...v3.2.10)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 08:24:51 +00:00
Marco Neumann bd9107a226
docs: extend profiling docs (#5102) 2022-07-13 08:18:34 +00:00
Marko Mikulicic ad4ea13e9d
Merge pull request #5110 from influxdata/update-objectstore
fix: Bump object_store crate for EKS support
2022-07-13 01:28:27 +02:00
Marko Mikulicic 60df069a10 fix: Bump object_store crate for EKS support 2022-07-13 01:12:29 +02:00
Nga Tran bce8924b4c
refactor: use max_sequence_number to sort chunks for deduplication (#5101)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-12 16:23:53 +00:00
dependabot[bot] 23ad60b35f
chore(deps): Bump serde from 1.0.138 to 1.0.139 (#5095)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.138 to 1.0.139.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.138...v1.0.139)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-12 15:34:51 +00:00
pierwill 500ece7c13
fix: Clean up docs for `addressable_heap` (#5092)
Removes an erroneous line and reworks one sentence for clarity.

Co-authored-by: pierwill <pierwill@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 19:31:38 +00:00
Marco Neumann 96da584139
test: do NOT create expensive bloom filters when we do not need them (#5089)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 16:29:53 +00:00
Marco Neumann 607831585c
refactor: use less executors and threads during tests (#5086)
`Executor` is only used as a performance boundary, not as a correctness
or data boundary so let's try to re-use it. This also simplifies
profiling of tests since we don't end up with hundreds (or even
thousands) of threads.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 16:23:22 +00:00
Marco Neumann 54039e8ae5
chore: fix `perf` and friends (#5087) 2022-07-11 15:46:27 +00:00
Andrew Lamb ec3f6a8597
docs: Add guide for running / profiling IOx locally (#5057)
* docs: Add guide for running / profiling IOx locally

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Add note about why it is an "underground" guide

* docs: add note about running locally, and move catalog prodding after

* docs: add INFLUXDB_IOX_MAX_HTTP_REQUEST_SIZE and clarify settings

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-07-11 13:20:58 +00:00
dependabot[bot] 5703d12595
chore(deps): Bump hashbrown from 0.12.1 to 0.12.2 (#5084)
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.1 to 0.12.2.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.1...v0.12.2)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-11 12:56:31 +00:00
dependabot[bot] fd3ee7e942
chore(deps): Bump crypto-common from 0.1.4 to 0.1.5 (#5083)
Bumps [crypto-common](https://github.com/RustCrypto/traits) from 0.1.4 to 0.1.5.
- [Release notes](https://github.com/RustCrypto/traits/releases)
- [Commits](https://github.com/RustCrypto/traits/compare/crypto-common-v0.1.4...crypto-common-v0.1.5)

---
updated-dependencies:
- dependency-name: crypto-common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-11 09:05:26 +00:00
Carol (Nichols || Goulding) 80b6c5c82f
fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077) 2022-07-08 16:31:52 +00:00
kodiakhq[bot] 3aef5ac697
Merge pull request #5070 from influxdata/cn/compactor-investigation
fix: Compact all files for a partition to 1 file
2022-07-08 14:47:41 +00:00
kodiakhq[bot] 353fadf34b
Merge branch 'main' into cn/compactor-investigation 2022-07-08 14:41:52 +00:00
Carol (Nichols || Goulding) 2ba97dd9df
fix: Remove out of date comment 2022-07-08 10:31:13 -04:00
Carol (Nichols || Goulding) 909c4b18d4
fix: Log more info when compacting files 2022-07-08 10:30:15 -04:00
Carol (Nichols || Goulding) a45767e705
fix: Restore compute_split_time to compactor utils 2022-07-08 10:14:41 -04:00
dependabot[bot] 93d1a0f5be
chore(deps): Bump hyper from 0.14.19 to 0.14.20 (#5073)
Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.19 to 0.14.20.
- [Release notes](https://github.com/hyperium/hyper/releases)
- [Changelog](https://github.com/hyperium/hyper/blob/v0.14.20/CHANGELOG.md)
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.19...v0.14.20)

---
updated-dependencies:
- dependency-name: hyper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 13:24:34 +00:00
Andrew Lamb 280698f9f5
feat: Increase `DmlWrite` operation throughput by pipelining kafka read and decode (#5066)
* feat: pipeline kafka read and decode

* docs: Update write_buffer/src/kafka/mod.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 13:18:21 +00:00
Carol (Nichols || Goulding) 75065abfb6
fix: Compact all data for a partition to one file 2022-07-08 09:07:43 -04:00
Carol (Nichols || Goulding) 959f0d3e02
fix: Clean up comments as I read through 2022-07-08 09:07:43 -04:00
Marco Neumann f1467cf4d8
refactor: try to query "cheap" chunks first (#5075)
This should help a lot once #5032 is implemented. Currently it doesn't
really make a difference.

See #5037, which also proposes a more advanced but more complex system.
The team however agreed to try something simple first.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 11:19:25 +00:00
Marco Neumann 0a61989df8
refactor: `QuerierParquet` + `QuerierRBChunk` = ❤️ (merge them together) (#5063)
* refactor: `QuerierParquet` + `QuerierRBChunk` = ❤️

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 08:06:53 +00:00
dependabot[bot] 9a3d31bc63
chore(deps): Bump backtrace from 0.3.65 to 0.3.66 (#5072)
Bumps [backtrace](https://github.com/rust-lang/backtrace-rs) from 0.3.65 to 0.3.66.
- [Release notes](https://github.com/rust-lang/backtrace-rs/releases)
- [Commits](https://github.com/rust-lang/backtrace-rs/compare/0.3.65...0.3.66)

---
updated-dependencies:
- dependency-name: backtrace
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-08 08:00:49 +00:00
dependabot[bot] 6c1b70ba2d
chore(deps): Bump filetime from 0.2.16 to 0.2.17 (#5071)
Bumps [filetime](https://github.com/alexcrichton/filetime) from 0.2.16 to 0.2.17.
- [Release notes](https://github.com/alexcrichton/filetime/releases)
- [Commits](https://github.com/alexcrichton/filetime/commits/0.2.17)

---
updated-dependencies:
- dependency-name: filetime
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 07:50:40 +00:00
Marco Neumann 41c8a8428f
feat: `ReadBufferCache::peek` (#5064)
For #5032.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-08 07:35:24 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Nga Tran a48e6ae733
docs: add consensus for the desired final output of the compactor (#5069)
* docs: add consensus for the desired final output of the compactor

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: add initial readme to the compactor

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-07 19:11:16 +00:00
kodiakhq[bot] c7911e0124
Merge pull request #5024 from influxdata/cn/s3-gc
feat: Automatically remove old / unreferenced objects from the object store
2022-07-07 14:10:04 +00:00
Carol (Nichols || Goulding) 6b4325642f
refactor: Rename inner_main to main as it's now public 2022-07-07 09:48:06 -04:00
Carol (Nichols || Goulding) 9a681e75cc
fix: Box a large enum variant 2022-07-07 09:48:06 -04:00
Carol (Nichols || Goulding) 70dd6009e8
fix: Integrate the object store garbage collector into the main binary 2022-07-07 09:48:06 -04:00
Carol (Nichols || Goulding) b6ff82c06e
fix: Correct copypasta'd doc comment 2022-07-07 09:48:06 -04:00
Carol (Nichols || Goulding) 5ef2298677
fix: Adding appropriate messages to log lines 2022-07-07 09:48:06 -04:00
Carol (Nichols || Goulding) d860a61bb2
fix: Remove unused error type 2022-07-07 09:48:05 -04:00
Carol (Nichols || Goulding) c8182eaf71
fix: Print to stdout when a file is deleted by the object store garbage collector 2022-07-07 09:48:05 -04:00