Commit Graph

49612 Commits (load-generator/refactor-and-support-sequential-field-data)

Author SHA1 Message Date
Marco Neumann 7b4dbb570d
refactor: clean up query log impl (#8775)
- take span ctx directly instead of the execution context (see point
  below)
- use the original trace ID (i.e. the one that we get via HTTP header),
  NOT some internal span/trace because the latter is only available for
  sampled requests, while the former one is generally more available (we
  also do that for the stdout logs btw.)
- minor code clean ups

This is prep work for #8774.
2023-09-20 09:20:19 +00:00
Marco Neumann fd50d7cfcf
Merge pull request #8771 from influxdata/crepererum/issue8705
feat: prune partitions before creating parquet chunks
2023-09-20 10:49:48 +02:00
Marco Neumann 4219d7d318 feat: prune partitions before creating parquet chunks
This should lower query latency, because creating many chunks just to
throw them away afterwards isn't exactly cheap.

Closes #8705.
2023-09-20 10:43:39 +02:00
Marco Neumann 60e795e15e
Merge pull request #8768 from influxdata/crepererum/issue8350b
refactor: allow streaming record batches into query
2023-09-20 10:36:23 +02:00
kodiakhq[bot] 809e0f4a42
Merge branch 'main' into crepererum/issue8350b 2023-09-20 08:21:04 +00:00
Andrew Lamb 65d0ea2055
chore: Update DataFusion (#8765)
* chore: Update DataFusion pin again

* chore: update for different type

* fix: statistics

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 22:26:53 +00:00
Joe-Blount c05739ff20
chore(compactor): move CompactRange up to RoundInfo (#8736)
* chore(compactor): move CompactRange up to RoundInfo

* chore: insta updates from compactor CompactRange refactor

* chore: lint cleanup

* chore: addressing some of the comments

* chore: remove duplicated done check

* chore: variable renaming
2023-09-19 16:53:36 +00:00
Nga Tran 0a7ae5603f
feat: make sory_key_ids non-optional (#8750)
* feat: make sory_key_ids non-optional

* refactor: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 14:22:29 +00:00
Dom a74aab709c
Merge pull request #8770 from influxdata/dom/merkle-bump-version
build:  latest merkle-search-tree version
2023-09-19 15:15:41 +01:00
Dom 8742de9819
Merge branch 'main' into dom/merkle-bump-version 2023-09-19 14:44:28 +01:00
Martin Hilton 39e35eb0a7
feat(querier): convert timezone sent from ingester (#8769)
* feat(querier): convert timezone sent from ingester

In order to facilitate the change of default timezone from None to
UTC make the querier able to convert the timezone sent from the
ingester into its preferred type. This can convert from None to UTC
or UTC to None and should allow the interaction between ingesters
and queriers with differing settings for the default timezone.

To allow testing of both conversions, the type checking has been
made more liberal when converting an arrow schema to an IOx one.

* fix: fmt

* fix: lint

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 13:31:48 +00:00
Dom Dwyer 33a441fbec
build: pick up latest merkle-search-tree version
Pick up the improvements allowing construction of PageRangeSnapshots
from owned keys / no cloning.
2023-09-19 14:09:07 +02:00
Marco Neumann 74b1a5e368 refactor: allow streaming record batches into query
For #8350, we won't have all the record batches from the ingester during
planning but we'll stream them during the execution. Technically the
DF plan is already based on streams, it's just `QueryChunkData` that
required a materialized `Vec<RecordBatch>`. This change moves the stream
creation up so a chunk can decide to either use `QueryChunkData::in_mem`
(which conveniently creates the stream) or it can provide its own
stream.
2023-09-19 13:53:37 +02:00
Fraser Savage f6d6dd9b5b
test(ingester): Add timeout panic to blocked `IngestState` WAL replay test 2023-09-19 11:58:52 +01:00
Marco Neumann ca791386eb
refactor: clean up chunk pruning metrics/observers (#8766)
There where like 3 layers (metrics, observer, pruner) that all only had
a single implementation. IIRC this is a leftover from older code where
`iox_query` was more involved in query pruning. With #8705 however the
chunk pruning is pushed even closer to the source (i.e. the querier
code) and it is just more practical to perform the metric management
directly in the querier code (this was the case already, it was just
somewhat hidden by the interfaces). This also allows us to add metrics
for #8705 more easily.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 10:53:14 +00:00
Fraser Savage 9e80e03069
feat(ingester): Respect non-disk full `IngestStateError` during WAL replay
This change causes WAL replay to mimic the RPC write handler, mostly
respecting the `IngestState` before apply an op while replaying a WAL
file. The caveat is that `DiskFull` is ignored as WAL replay specifically
helps with this state.
2023-09-19 11:34:24 +01:00
dependabot[bot] b135cb8d23
chore(deps): Bump pbjson from 0.5.1 to 0.6.0 (#8755)
Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.5.1 to 0.6.0.
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 10:24:39 +00:00
Fraser Savage 8548ea3b31
refactor(ingester): Pass `IngestState` to WAL replay
This requires the `IngestState` and associated types to be public so
that WAL replay can be called by the benchmarker. The module containing
the `IngestState` is private and is only conditionally re-exported under
the benchmark feature as part of the `internal_implementation_details`
module.
2023-09-19 11:21:42 +01:00
dependabot[bot] 9123c6126d
chore(deps): Bump predicates from 3.0.3 to 3.0.4 (#8761)
Bumps [predicates](https://github.com/assert-rs/predicates-rs) from 3.0.3 to 3.0.4.
- [Changelog](https://github.com/assert-rs/predicates-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/assert-rs/predicates-rs/compare/v3.0.3...v3.0.4)

---
updated-dependencies:
- dependency-name: predicates
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 10:18:03 +00:00
kodiakhq[bot] 4a653a3ff7
Merge pull request #8666 from influxdata/savage/automatic-recovery-from-incomplete-wal-write
feat(ingester): Automatically recover from an incomplete WAL write
2023-09-19 09:41:57 +00:00
kodiakhq[bot] 87a25cf3cb
Merge branch 'main' into savage/automatic-recovery-from-incomplete-wal-write 2023-09-19 09:35:33 +00:00
Dom 5fbe7b80b9
Merge pull request #8762 from influxdata/dependabot/cargo/clap-4.4.4
chore(deps): Bump clap from 4.4.3 to 4.4.4
2023-09-19 10:35:18 +01:00
kodiakhq[bot] d034a0ed5f
Merge branch 'main' into savage/automatic-recovery-from-incomplete-wal-write 2023-09-19 09:34:45 +00:00
Dom 500112bd47
Merge branch 'main' into dependabot/cargo/clap-4.4.4 2023-09-19 10:28:35 +01:00
Marco Neumann 949635b324
feat: use time-based column ranges in querier (#8732)
Use output of #8725 within the column ranges of the querier. Currently
this won't have any effect since the column ranges are only used to
prune parquet files and parquet files come with their own, more precise
time range (and that information has priority). However for #8705 we
want to use it to prune partitions before needing to deal with the
parquet files.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 09:13:50 +00:00
Dom 18c45d39bb
Merge branch 'main' into dependabot/cargo/clap-4.4.4 2023-09-19 10:12:09 +01:00
Dom 41856be312
Merge pull request #8753 from influxdata/dom/merkle-protocol
feat(gossip): schema cache anti-entropy service
2023-09-19 10:07:36 +01:00
Dom 43e7094a06
Merge branch 'main' into dom/merkle-protocol 2023-09-19 09:49:26 +01:00
Dom Dwyer ef1a7b0ce8
docs: lexicographical ordering of min/max
The min/max values are the minimum/maximum values when ordered
lexicographically.
2023-09-19 10:45:32 +02:00
Joe-Blount 80f8b55baa
fix(compactor): retry OOM error at reduced concurrency (#8763)
* fix(compactor): retry OOM error at reduced concurrency

* chore: address comment
2023-09-18 20:01:08 +00:00
dependabot[bot] 38ea9a6cc8
chore(deps): Bump clap from 4.4.3 to 4.4.4
Bumps [clap](https://github.com/clap-rs/clap) from 4.4.3 to 4.4.4.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.4.3...v4.4.4)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 18:15:43 +00:00
Andrew Lamb 58d892fcdf
chore: Update DataFusion pin (#8749)
* chore: Update DataFusion pin and `chrono`

* chore: Update for deprecation

* chore: Update plans

* fix: fix update logic in percentile

* chore: update to avoid deprecated from_exprs api

* fix: Update arrow pin, fix plan errors

* test: for describe

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-18 18:11:23 +00:00
Dom 5e3e2f087e
Merge pull request #8734 from influxdata/dom/limit-active-partitions
feat(ingester): limit non-empty partitions per namespace
2023-09-18 15:14:52 +01:00
Fraser Savage 55f089beca
refactor(ingester): Use single match for error handling on WAL file replay results 2023-09-18 15:09:47 +01:00
Dom f9225d34df
Merge branch 'main' into dom/limit-active-partitions 2023-09-18 14:53:39 +01:00
Fraser Savage 35e14c315f
test(ingester): Use `assert_counter` in WAL replay tests 2023-09-18 14:35:48 +01:00
Marco Neumann 0bf1000fe9
feat: error "ignore" layer for i->q V2 client (#8754)
We shall ignore certain error cases during query processing. This layer
provides an easy interface for that.

Note that this is also done in the V1 client, just in a more hidden /
hard-to-test manner.

For #8349.
2023-09-18 13:31:23 +00:00
Fraser Savage b179076011
refactor(ingester): Use separate attribute for ingester WAL replay failure reason
By separating the failure reason from the result of the WAL file replay
metric, it keeps flexibility to include other failure modes in future.
2023-09-18 14:11:26 +01:00
Fraser Savage 2546fbb796
docs(ingester): Document the meaning of the `SequenceNumber` returned for a WAL replay read error 2023-09-18 14:03:56 +01:00
dependabot[bot] 1760fe7736
chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#8752)
* chore(deps): Bump chrono from 0.4.30 to 0.4.31

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.30 to 0.4.31.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.30...v0.4.31)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: chrono ts -> nanos can fail, fix deprecation warning

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-18 12:57:48 +00:00
Fraser Savage acc2d6f33c
refactor(wal): Tidy up pointless let assignment 2023-09-18 13:57:03 +01:00
Dom af425f657c
Merge pull request #8748 from influxdata/cn/refactors
refactor: Prepare for correcting validation of max tables and max columns per table
2023-09-18 13:40:51 +01:00
Dom c591a4fef6
Merge branch 'main' into cn/refactors 2023-09-18 13:22:31 +01:00
Dom Dwyer 4ec07b5d80
feat(gossip): schema cache anti-entropy service
Defines an RPC service to be used by two peers to converge their schema
cache content by exchanging their serialised Merkle Search Tree pages (a
compact representation of the MST itself).

This will be used in the latter half of the following sync protocol:

                    ┌─────┐                   ┌────┐
                    │Local│                   │Peer│
                    └──┬──┘                   └─┬──┘
                       │ [1] Send content hash  │
                       │────────────────────────>
                       │                        │
                       │                 ┌───────────────┐
                       │                 │Compute hash,  │
                       │                 │stop if equal  │
                       │                 └───────────────┘
                       │                        │
                       │   ╔════════════════╗   │
═══════════════════════╪═══╣ Switch to gRPC ╠═══╪═══════════════════════
                       │   ╚════════════════╝   │
                       │                        │
                       │[2] Serialised MST pages│
                       │<────────────────────────
                       │                        │
                ┌──────────────┐                │
                │Perform diff  │                │
                └──────────────┘                │
                       │ [3] Inconsistent pages │
                       │────────────────────────>
                       │                        │
                       │                        │
          ╔═══════╤════╪════════════════════════╪════════════╗
          ║ LOOP  │  For each inconsistent page │            ║
          ╟───────┘    │                        │            ║
          ║            │      [4] Scheams       │            ║
          ║            │<────────────────────────            ║
          ╚════════════╪════════════════════════╪════════════╝
                    ┌──┴──┐                   ┌─┴──┐
                    │Local│                   │Peer│
                    └─────┘                   └────┘

The initial consistency probe request [1] is sent over gossip and is
used to trigger a further sync of inconsistent MST content if necessary.
This message is a no-op if the MSTs are found to be fully consistent.

If an inconsistency is detected between the two peers, the protocol
switches to perform RPC over TCP, calling the AntiEntropyService defined
in this commit to complete the sync process.

The receiver of the consistency probe [1] calls GetTreeDiff and provides
the their MST pages [2], causing the local node to compute the diff
between the two MSTs, and return the set of inconsistent ranges [3] that
require convergence.

Once the set of inconsistent ranges have been identified, the peer pulls
all the schemas within those ranges and merges them into the local cache
to ensure it has all the content of the source node.

Once this protocol has run in both directions between two peers (and in
absence of further updates between runs) then these two peers are
guaranteed to have converged.
2023-09-18 11:38:22 +02:00
Marco Neumann 012df69974
feat: i->q V2 circuit breaker (#8743)
* feat: impl `PartialEq + Eq` for `TestError`

* feat: i->q V2 circuit breaker

This is a straight port from V1, it even uses the same test. The code is
copied though (instead of reusing the old one) because the interface in
the V2 client is so different and the new testing infra is also nicer
(IMHO).

For #8349.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-18 08:58:56 +00:00
dependabot[bot] eb80fea517
chore(deps): Bump mockito from 1.1.1 to 1.2.0 (#8751)
Bumps [mockito](https://github.com/lipanski/mockito) from 1.1.1 to 1.2.0.
- [Release notes](https://github.com/lipanski/mockito/releases)
- [Commits](https://github.com/lipanski/mockito/compare/1.1.1...1.2.0)

---
updated-dependencies:
- dependency-name: mockito
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-18 07:55:49 +00:00
Carol (Nichols || Goulding) 1f485747d8
refactor: Move type validation logic into data_types rather than the gRPC service 2023-09-15 17:14:05 -04:00
Carol (Nichols || Goulding) ada3d310d9
refactor: Use the newtypes in the catalog functions for updating service limits 2023-09-15 16:23:31 -04:00
Carol (Nichols || Goulding) 5a74d09194
fix: Use the newtypes in NamespaceServiceProtectionLimitsOverride 2023-09-15 16:23:31 -04:00
Carol (Nichols || Goulding) a42a00b6f2
refactor: Consistently order max tables first, then max columns
I can't handle it.
2023-09-15 16:23:31 -04:00