Commit Graph

9184 Commits (65f1550126f61ea1ecfdb1e5cefa6de4ba00261e)

Author SHA1 Message Date
Andrew Lamb 65f1550126
feat: Implement `debug parquet_to_lp` command to convert parquet to line protocol (#5734)
* feat: add `influxdb_iox debug parquet_to_lp` command

* chore: Run cargo hakari tasks

* fix: update command description

* fix: remove unecessary Result import

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-09-26 14:17:27 +00:00
Nga Tran b11da1d98b
fix: a silly bug that did not capture file limit if a lot of L0 files and very few or non overlapped L1 (#5736) 2022-09-23 21:03:29 +00:00
Nga Tran c4542d6b21
chore: more verbose about the memory budget inserted in to the catalog table skipped_comapction (#5735) 2022-09-23 18:40:09 +00:00
Nga Tran bb7df22aa1
chore: always use a fixed number of rows (8192) per batch to estimate memory (#5733) 2022-09-23 15:51:25 +00:00
Nga Tran da697815ff
chore: add more info about memory budget at the time of over-file-limit into skipped_compaction for us to see if we shoudl increase the file limit (#5731)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-23 13:34:38 +00:00
kodiakhq[bot] 5b5a645ea9
Merge pull request #5732 from influxdata/crepererum/avoid_policy_set_clones
refactor: avoid clones for `policy::Subscriber::set`
2022-09-23 12:32:31 +00:00
Marco Neumann 4fdfed5ea7 refactor: avoid clones for `policy::Subscriber::set`
Policy subscribers basically never store `V` (the cached value), so we
should not clone that one unconditionally. But even `K` (the cache key)
is not stored in all cases. So let's pass a reference for both and let
the policies decide if they wanna clone the data or not.
2022-09-23 11:22:59 +02:00
dependabot[bot] e83939e47e
chore(deps): Bump serde from 1.0.144 to 1.0.145 (#5730)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.144 to 1.0.145.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.144...v1.0.145)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-23 08:12:20 +00:00
Carol (Nichols || Goulding) c8108f01e7
chore: Upgrade to Rust 1.64 (#5727)
* chore: Upgrade to Rust 1.64

* fix: Use iter find instead of a for loop, thanks clippy

* fix: Remove some needless borrows, thanks clippy

* fix: Use then_some rather than then with a closure, thanks clippy

* fix: Use iter retain rather than filter collect, thanks clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 18:04:00 +00:00
Nga Tran 61075d57e2
chore: turn full cold compaction on (#5728) 2022-09-22 17:07:35 +00:00
Nga Tran aaec5104d6
chore: turn compaction cold partition step 1 on to work with our new … (#5726)
* chore: turn compaction cold partition step 1 on to work with our new memory budget that considers the num_files limitation

* chore: run fmt
2022-09-22 14:59:27 +00:00
Nga Tran e3deb23bcc
feat: add minimum row_count per file in estimating compacting memory… (#5715)
* feat: add minimum row_count per file in estiumating compacting memory budget and limit number files per compaction

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* test: add test per review comments

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* test: add one more test that has limit num files larger than total input files

* fix: make the L1 files in tests not overlapped

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 14:37:39 +00:00
Marco Neumann 55ef272920
refactor: acquire table locks concurrently (#5722)
Waiting for one after the other (one per shard) in serial fashion
likely increases latency too much.
2022-09-22 10:56:22 +00:00
dependabot[bot] 76d9a88761
chore(deps): Bump md-5 from 0.10.4 to 0.10.5 (#5717)
Bumps [md-5](https://github.com/RustCrypto/hashes) from 0.10.4 to 0.10.5.
- [Release notes](https://github.com/RustCrypto/hashes/releases)
- [Commits](https://github.com/RustCrypto/hashes/compare/md-5-v0.10.4...md-5-v0.10.5)

---
updated-dependencies:
- dependency-name: md-5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 07:40:54 +00:00
Marco Neumann 365a246f8d
refactor: do not run de-dup in ingester for querier requests (#5626)
* refactor: do not run de-dup in ingester for querier requests

This removes the entire de-dup logic from the inegster for querier
requests. Furthermore, it even removes the entire datafusion execution
from the querier and just dumps the in-memory record batches as quickly
as possible. No filters are applied. Note that even prior to this PR,
we've never applied projections (tracked by #5624).

**Pros:**

- speed up query planning within the querier (since we need the ingester
  response for state reconciling)
- lowered ingester CPU load

**Cons:**

- more querier<>ingester network traffic

Closes #5602.

* test: extend query test case

* fix: ingester tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 07:33:54 +00:00
Marco Neumann fd45fbc9ab
refactor: use cheaper hash keys for projected schemas (#5713)
* refactor: arc the cached table

* refactor: use cheaper hash keys for projected schemas

Instead of using the column names to address projected schemas, let's
use the column IDs.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 05:31:02 +00:00
kodiakhq[bot] 2f3b2ac6c5
Merge pull request #5677 from influxdata/cn/clarifiy-compactor
feat: Improve compactor code organization and test coverage
2022-09-21 16:05:21 +00:00
Carol (Nichols || Goulding) aa822a40cf
refactor: Move config in with the relevant assertions
Now that only one hot test is using a CompactorConfig, move it into that
test to avoid spooky action at a distance.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding) f0bf3bd21c
test: Clarify descriptions for the remaining assertion
The assertion remaining in this test is now important because of having
multiple shards and showing which partition per shard is chosen.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding) 7c7b058276
refactor: Extract unit test for case 5 2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding) f5bd81ff3c
refactor: Extract unit test for case 4 2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding) 765feaa4d8
refactor: Extract a unit test for case 3 2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding) a7a480c1ba
refactor: Extract a unit test for case 2 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) d95f252a8e
refactor: Extract a unit test for case 1
Also add coverage for when there are no *partitions* in addition to the
test for when there are no *parquet files*.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) 9372290ec9
refactor: Use iox_test helpers to simplify test setup 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) f22627a97f
test: Move an integration test of hot compact_one_partition to lib 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) a7bb0398e6
test: Move an integration test of compact_candidates_with_memory_budget to the same file 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) 316ebfa8c1
test: Call the smaller inner hot_partitions_for_shard when only one shard is involved 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) fcf9a9d589
refactor: Move fetching of config from compactor inside hot_partitions_to_compact
But still pass them to hot_partitions_for_shard.

And make the order of the arguments the same as for
recent_highest_throughput_partitions because I've already messed the
order up. And make the names the same throughout.

This makes the closure passed to get_candidates_with_retry simpler.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) 48b7876174
refactor: Extract a function for computing query nanoseconds ago 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) 7dcaf5bd3d
refactor: Extract a function for getting hot partitions for one shard 2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding) b557c30fd3
refactor: Move hot compaction candidates to the hot module 2022-09-21 11:57:55 -04:00
Carol (Nichols || Goulding) fa11031a36
refactor: Extract a shared function to retry fetching of compaction candidates 2022-09-21 11:57:55 -04:00
Marco Neumann 8e6d9f8af1
chore: upgrade sqlx to 0.6.2 (#5712) 2022-09-21 11:29:45 +00:00
dependabot[bot] 1e4f4135a3
chore(deps): Bump pbjson-build from 0.4.0 to 0.5.0 (#5706)
Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson-build
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 09:53:30 +00:00
dependabot[bot] 0cc29300ce
chore(deps): Bump pbjson-types from 0.4.0 to 0.5.0 (#5703)
Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson-types
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 09:44:26 +00:00
dependabot[bot] 09cb62b75b
chore(deps): Bump pbjson from 0.4.0 to 0.5.0 (#5702)
Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-21 09:34:47 +00:00
Marco Neumann c66f16e4af
fix: ingester retries (#5708)
* fix: retry ingester requests faster

The retries introduced in #5695 are too slow and block the entire
querier for minutes (until the very long gRPC timeout kicks in).

* fix: add error details on why the query planning failed

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-21 09:27:47 +00:00
dependabot[bot] ea1e822e3b
chore(deps): Bump itertools from 0.10.4 to 0.10.5 (#5707)
Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.4 to 0.10.5.
- [Release notes](https://github.com/rust-itertools/itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/commits)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 08:15:59 +00:00
dependabot[bot] 78981a62a1
chore(deps): Bump reqwest from 0.11.11 to 0.11.12 (#5705)
Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.11 to 0.11.12.
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.11...v0.11.12)

---
updated-dependencies:
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 05:39:10 +00:00
dependabot[bot] 8af14f6d36
chore(deps): Bump lock_api from 0.4.8 to 0.4.9 (#5704)
Bumps [lock_api](https://github.com/Amanieu/parking_lot) from 0.4.8 to 0.4.9.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Amanieu/parking_lot/compare/lock_api-0.4.8...lock_api-0.4.9)

---
updated-dependencies:
- dependency-name: lock_api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 05:30:50 +00:00
dependabot[bot] 0d18943ad2
chore(deps): Bump once_cell from 1.14.0 to 1.15.0 (#5701)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.14.0 to 1.15.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.14.0...v1.15.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 05:20:55 +00:00
Andrew Lamb ea51feadf4
chore: Improve debug logging when parquet files are created (#5699)
* chore: Improve debug logging when parquet files are created

* fix: add duration for encoding parqut

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-20 20:43:34 +00:00
Nga Tran 1d306061b9
chore: disable cold compaction again since its step 1 is the culprit (#5700) 2022-09-20 20:34:28 +00:00
Nga Tran 34bc02b59b
chore: turn cold comapction on but only compact L0s and thier overlapped L1s (#5698) 2022-09-20 18:44:36 +00:00
kodiakhq[bot] 4a78db4490
Merge pull request #5697 from influxdata/dom/fix-ingester-tests
test: refactor ingester query tests
2022-09-20 16:09:53 +00:00
kodiakhq[bot] 7e03f483c6
Merge branch 'main' into dom/fix-ingester-tests 2022-09-20 16:01:59 +00:00
Marco Neumann 5e7fd55a42
refactor: retry querier->ingester requests (#5695)
* refactor: retry querier->ingester requests

Esp. for InfluxRPC requests that scan multiple tables, it may be that
one ingester requests fails. We shall retry that request instead of
failing the entire query.

* refactor: improve docs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: less foo

* docs: remove outdated TODO

* test: assert that panic happened

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-09-20 15:51:02 +00:00
Dom Dwyer c6fe0dab3e refactor(ingester): reduced internal visibility
Changes many pub fields / methods to be pub(super), or if necessary,
pub(crate).

This helps maintain an internal API boundary for code hygiene, and helps
identify functions that are unused / only used in tests (which I've
annotated with cfg(test) and intend to remove - we should be driving
code under test via the public API rather than using test-only state
mutation, otherwise we're just testing our tests!)
2022-09-20 16:24:27 +01:00
Dom Dwyer 6d00d6b683 test(ingester): refactor querier API tests
This commit changes the prepare_data_to_querier() tests to drive the
ingester state by applying DML ops, therefore driving the prod code
paths (and testing them!) rather than having the tests set up what the
tests believe is the correct internal ingester state, and then asserting
on that state.

This gives us much better coverage of prod code paths, decouples the
tests from the internal state/representation of ingesters (making the
tests less fragile), and removes a bunch of special-cased, test-only
functions that are functionally similar, but not the same as, the prod
functions.

Unblocks #5658, further clean-up to come.
2022-09-20 16:24:27 +01:00