Commit Graph

12778 Commits (9775e150b2ff78ce5817a28153d5827937f7c35d)

Author SHA1 Message Date
Marco Neumann 9775e150b2
refactor: single entry point for partition cache (#8093)
For #8089 I would like to request each partition only once. Since
internally we store both the sort key and the column ranges in one cache
value anyways, there is no reason to offer two different methods to look
them up.

This only changes the `PartitionCache` interface. The actual lookups are
still separate, but will be changed in a follow-up.
2023-06-27 16:22:13 +00:00
Marco Neumann 9d8b620cd2
refactor: gather column ranges after decoding (#8090)
We need to decode the ingester data in a serial fashion (since it is a
data stream). Cache access during that phase is costly since we cannot
parallize that. To avoid that, we gather the column ranges AFTER
decoding and calculate the chunk statistics accordingly.

This refactoring also removes the partition sort key from ingester
partitions since they are not required anymore. They are a leftover of
the old physical query planning. They were not marked as "unused" since
they were used by some test code.

Required for #8089.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:44:06 +00:00
Marco Neumann d9ce92dad1
fix: do not override all rustflags in circleci (#8061)
As we've learned in #8048 and #8052, rustflags do NOT stack. Since we
only want to change one specific parameter (the debug feature), use the
env variable that cargo provides us.

**In contrast to the linked PRs, this only changes the test excution. Prod
builds remain untouched.**

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:35:04 +00:00
Marco Neumann 1d101bde5f
fix: panics in querier->ingester circuit breaker (#8080)
The circuit breaker needs to act on concurrent requests to the same
ingester. To do that, it performs the following steps per request:

1. check current circuit state (if open, then exit here)
2. perform request (if closed or as a half-open test request)
3. change circuit state based on results

Now only step 1 and step 3 hold locks to allow concurrency. This means
that in the meantime, the circuit state might change. To check that, the
circuit state has a generation counter.

The bug now was an overly strong assumption on the generation counter /
state change. Namely that if we are in step 3 and the state is
"half-open", then nobody else could have changed the state in the
meantime because for a single ingester, there can only be one test
request for the half-open state. While the latter part of this is
correct, the former is wrong. Namely we could have started in step 1
with a closed circuit and ended in a half-open one. Namely if the
following sequence happen:

1. request, blocks on upstream
2. circuit breaks
3. some time passes
4. a half-open requests starts, blocks on upstream
5. request from step 1 returns, finds itself confused

This now fixes the assertion (both in case that the request from step 1
succeeds and fails).

Includes tests for the two scenarios (`test_late_failure_after_half_open`,
`test_late_ok_after_half_open`) and an additional one that I came up with
while thinking about the issue (`test_late_failure_after_recovery`, was
passing on `main` but still good to have).

Fixes #8065.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:09:18 +00:00
dependabot[bot] 647541fc12
chore(deps): Bump croaring from 0.8.1 to 0.9.0 (#8088)
Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.8.1 to 0.9.0.
- [Release notes](https://github.com/saulius/croaring-rs/releases)
- [Commits](https://github.com/saulius/croaring-rs/commits)

---
updated-dependencies:
- dependency-name: croaring
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 08:10:39 +00:00
dependabot[bot] db1d977380
chore(deps): Bump hyper from 0.14.26 to 0.14.27 (#8087)
Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.26 to 0.14.27.
- [Release notes](https://github.com/hyperium/hyper/releases)
- [Changelog](https://github.com/hyperium/hyper/blob/v0.14.27/CHANGELOG.md)
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.26...v0.14.27)

---
updated-dependencies:
- dependency-name: hyper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 07:44:15 +00:00
Joe-Blount 40865e011c
fix: compactor loop on L1 files (#8082)
* chore: suppress insta run output on some long tests

* fix: prevent L1 compaction looping

* chore: insta updates from prior commit

* chore: addresss comments
2023-06-26 21:21:24 +00:00
dependabot[bot] 7e30c91ceb
chore(deps): Bump libc from 0.2.146 to 0.2.147 (#8079)
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.146 to 0.2.147.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.146...0.2.147)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 10:18:55 +00:00
dependabot[bot] 54aaafd496
chore(deps): Bump rustyline from 11.0.0 to 12.0.0 (#8078)
Bumps [rustyline](https://github.com/kkawakam/rustyline) from 11.0.0 to 12.0.0.
- [Release notes](https://github.com/kkawakam/rustyline/releases)
- [Changelog](https://github.com/kkawakam/rustyline/blob/master/History.md)
- [Commits](https://github.com/kkawakam/rustyline/compare/v11.0.0...v12.0.0)

---
updated-dependencies:
- dependency-name: rustyline
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 10:05:29 +00:00
dependabot[bot] a0f382302a
chore(deps): Bump clap from 4.3.5 to 4.3.8 (#8077)
Bumps [clap](https://github.com/clap-rs/clap) from 4.3.5 to 4.3.8.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.3.5...v4.3.8)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 09:37:26 +00:00
dependabot[bot] 24ca0fef30
chore(deps): Bump serde_json from 1.0.97 to 1.0.99 (#8076)
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.97 to 1.0.99.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.97...v1.0.99)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 09:25:38 +00:00
dependabot[bot] 5c7dd8bd88
chore(deps): Bump toml from 0.7.4 to 0.7.5 (#8075)
Bumps [toml](https://github.com/toml-rs/toml) from 0.7.4 to 0.7.5.
- [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.4...toml-v0.7.5)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-26 09:17:16 +00:00
dependabot[bot] 37be773890
chore(deps): Bump phf_shared from 0.11.1 to 0.11.2 (#8074)
Bumps [phf_shared](https://github.com/rust-phf/rust-phf) from 0.11.1 to 0.11.2.
- [Release notes](https://github.com/rust-phf/rust-phf/releases)
- [Changelog](https://github.com/rust-phf/rust-phf/blob/master/RELEASE_PROCESS.md)
- [Commits](https://github.com/rust-phf/rust-phf/compare/phf_shared-v0.11.1...phf_shared-v0.11.2)

---
updated-dependencies:
- dependency-name: phf_shared
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 09:04:56 +00:00
dependabot[bot] 990044dcb2
chore(deps): Bump indexmap from 1.9.3 to 2.0.0 (#8073)
* chore(deps): Bump indexmap from 1.9.3 to 2.0.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.3 to 2.0.0.
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.9.3...2.0.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-26 08:52:51 +00:00
dependabot[bot] cf560d18f1
chore(deps): Bump sqlparser from 0.34.0 to 0.35.0 (#8071)
* chore(deps): Bump sqlparser from 0.34.0 to 0.35.0

Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.34.0 to 0.35.0.
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.34.0...v0.35.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-06-26 08:40:48 +00:00
Carol (Nichols || Goulding) 60d0858381
feat: Add catalog method for looking up partitions by their hash ID (#8018)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 14:42:50 +00:00
kodiakhq[bot] 13771c4616
Merge pull request #8064 from influxdata/crepererum/add_messy_picture_back
docs: add counter-example back
2023-06-23 12:52:21 +00:00
kodiakhq[bot] 24061463dd
Merge branch 'main' into crepererum/add_messy_picture_back 2023-06-23 12:46:46 +00:00
Marco Neumann c14d8551e5
refactor: set up jemalloc metrics in a more central place (#8062)
We already have a method that adds some default metrics / instruments to
the metric registry. Use that for jemalloc as well. This makes it easier
to follow how metrics are setup up for our prod binary.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 10:53:27 +00:00
Marco Neumann 76a67a2c65 docs: add counter-example back
Was removed in #8033 but that made @alamb sad.
2023-06-23 12:53:04 +02:00
Joe-Blount 99d0530a21
fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8056)
* chore: adjust with_max_num_files_per_plan to more common setting

This significantly increases write amplification (see change in `written` at the conclusion of the cases)

* fix: compactor looping with unproductive compactions

* chore: formatting cleanup

* chore: fix typo in comment

* chore: add test case that compacts too many files at once

* fix: enforce max file count for compaction

* chore: insta churn from prior commit

---------

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 09:19:06 +00:00
Marco Neumann 7322f238fb
docs: query processing (#8033)
* docs: query processing

Closes https://github.com/influxdata/idpe/issues/17770 .

* docs: apply recommendations

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve description of the flight protocol

* docs: link `LogicalPlan`

* docs: link `ExecutionPlan`

* docs: improve wording

* docs: improve query planning docs

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-06-23 09:13:14 +00:00
dependabot[bot] 74a48a8f63
chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060)
* chore(deps): Bump itertools from 0.10.5 to 0.11.0

Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:11:56 +00:00
Marco Neumann 178483c1a0
feat: basic non-aggregates w/ InfluxQL selector functions (#8016)
* test: ensure that selectors check arg count

* feat: basic non-aggregates w/ InfluxQL selector functions

See #7533.

* refactor: clean up code

* feat: get more advanced cases to work

* docs: remove stale comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:05:50 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
Martin Hilton b57a53eff4
feat: wrap flight DoGet ticket in "Any" (#8053)
Use a protobuf "Any" to wrap the "ReadInfo" message in a DoGet
ticket. This will make it easier to extend in the future different
ticket types, as appropriate. It also makes the comment speak the
truth.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 05:59:47 +00:00
wiedld 62251e2323
refactor(idpe-17789): delineate btwn partitions_source within the scheduler versus compactor (#8028)
This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.
2023-06-22 11:48:08 -07:00
Marco Neumann 4cc79b8df6
fix: do not exclude cargo configs in docker build (#8054) 2023-06-22 14:31:02 +00:00
Marco Neumann 089f512e88
fix: do not use empty `RUSTFLAGS` for docker build (#8052)
Empty `RUSTFLAGS` overrides everything that we define in our cargo
config.
2023-06-22 14:03:11 +00:00
kodiakhq[bot] 11a35bd767
Merge pull request #7963 from influxdata/cn/partition-table-only
feat: Start generating PartitionHashIds
2023-06-22 13:08:27 +00:00
Carol (Nichols || Goulding) 1912840c25
docs: Update size calculations in the description of PartitionCache 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 0d9f89ae48
test: Add verification of deterministic and collision-resistant properties of PartitionHashId 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 5096164efb
docs: Explain importance of the fixture test and what a failure would mean
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 62ab8d21c2
fix: Eliminate need to have 2 separate insert statements depending on presence of hash ID
I figured out that the reason inserting `Option<PartitionHashId>` was
giving me a compiler error that `Encode` wasn't implemented was because
I only implemented `Encode` for `&PartitionHashId` and sqlx only
implements `Encode` for `Option<T: Encode>`, not `Option<T> where &T:
Encode`. Using `as_ref` makes this work and gets rid of the `match` that
created two different queries (one of which was wrong!)

Also add tests that we can insert Parquet file records for partitions
that don't have hash IDs to ensure we don't break ingest of new data for
old-style partitions.
2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) bffb2f8f9f
fix: Specialize Partition constructors to clarify appropriate usage 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 41420cb920
fix: Borrow transition partition ID when possible 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) d991e12fbb
feat: Send PartitionHashId from ingesters to queriers 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 5411d8b7c8
refactor: Move Partition type and friends to their own file 2023-06-22 08:59:10 -04:00
kodiakhq[bot] 8604a1a0cf
Merge pull request #8035 from influxdata/dom/compactor-query-rate-limit
feat(compactor): partition fetch query rate limit
2023-06-22 12:13:50 +00:00
Dom cb2968f0ef
Merge branch 'main' into dom/compactor-query-rate-limit 2023-06-22 13:08:24 +01:00
Dom Dwyer cb79429b5f
refactor: wait until next attempt deadline
Minor optimisation for cases where load exceeds the limit, but not by
much - sleep until the next query is allowed, rather than a full query
period.
2023-06-22 14:07:47 +02:00
Marco Neumann 0dde4f0703
fix: `tokio_unstable` for Linux x64 (#8048)
Apparently rustflag configs don't stack, so we need to re-specify the
whole list.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 11:09:11 +00:00
Andrew Lamb fb0674fc01
Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)" (#8049)
This reverts commit 70ffedadc7.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 11:03:25 +00:00
kodiakhq[bot] 1ba978c623
Merge pull request #8031 from influxdata/savage/replace-dml-operation-for-ingester-rpc-write
feat(ingester): Replace `DmlOperation` with `IngestOp` in RPC write path
2023-06-22 10:56:43 +00:00
kodiakhq[bot] b226d4dd23
Merge branch 'main' into savage/replace-dml-operation-for-ingester-rpc-write 2023-06-22 10:51:18 +00:00
Marco Neumann 4d1c6f805c
refactor: clean up rustflags and build args (#8047)
- move `tokio_unstable` to cargo config, so all we can use it within
   our code (e.g. for #7982)
- disable incremental builds for prod docker builds. this was tried
  before but got lost at some point because build params weren't passed
  to docker correclty
- fix `CARGO_NET_GIT_FETCH_WITH_CLI` for docker builds (env wasn't
  passed through)
2023-06-22 09:58:46 +00:00
Fraser Savage 43d6cb6eb1
Merge branch 'main' into savage/replace-dml-operation-for-ingester-rpc-write 2023-06-22 10:14:38 +01:00
Fraser Savage fab088f680
refactor(ingester): Split up the `WriteOperation` sub-types into separate modules 2023-06-22 10:08:26 +01:00
Marco Neumann 4e18a5f9e8
refactor: remove querier state reconciler (#8046)
The reconciler is a leftover from the Kafka-based write path. It doesn't
do anything anymore.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 09:03:46 +00:00