Commit Graph

731 Commits (803122e3b46c3d7206bd7853516289f0cda102ae)

Author SHA1 Message Date
Dom Dwyer 7bd6e90830
perf: only send metadata for relevant partitions
When partition pruning is possible, it skips sending the data for
partitions that have no affect on the query outcome.

This commit does the same for the partition metadata - these frames can
form a significant portion of the query response when the row count is
low, and for pruned partitions have no bearing on the query result.
2023-07-12 18:38:43 +02:00
kodiakhq[bot] e73116a122
Merge branch 'main' into cn/query-catalog-with-either-partition-identifier 2023-07-12 14:51:02 +00:00
Dom Dwyer af56985d70
refactor(ingester): emit span for query handler
Emit a span that covers the entire flight query handler.
2023-07-12 14:42:43 +02:00
Carol (Nichols || Goulding) 22c17fb970
feat: Abstract over which partition ID type we're using to list Parquet files 2023-07-10 13:40:01 -04:00
Carol (Nichols || Goulding) c1e42651ec
feat: Abstract over which partition ID type we're using to compare and swap sort keys 2023-07-10 13:39:19 -04:00
Carol (Nichols || Goulding) eec31b7f00
feat: Abstract over which partition ID type we're using to get a partition from the catalog 2023-07-10 10:43:20 -04:00
kodiakhq[bot] 5fa861abab
Merge branch 'main' into savage/individually-sequence-partitions-within-writes 2023-07-10 12:48:37 +00:00
Dom 341dcf2124
Merge branch 'main' into dom/partition-query-concurrency 2023-07-10 10:24:09 +01:00
dependabot[bot] 12317fee23
chore(deps): Bump async-channel from 1.8.0 to 1.9.0
Bumps [async-channel](https://github.com/smol-rs/async-channel) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/smol-rs/async-channel/releases)
- [Changelog](https://github.com/smol-rs/async-channel/blob/master/CHANGELOG.md)
- [Commits](https://github.com/smol-rs/async-channel/compare/v1.8.0...v1.9.0)

---
updated-dependencies:
- dependency-name: async-channel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-10 01:42:26 +00:00
Dom Dwyer ea38e93511
test(bench): concurrent partition queries
Benchmark the performance of concurrent queries against a single
partition, varying the number of concurrent queries and size of buffered
data in the partition.
2023-07-07 16:27:44 +02:00
kodiakhq[bot] e06b6987f0
Merge branch 'main' into savage/remove-op-level-sequence-number-for-writes 2023-07-07 10:12:04 +00:00
dependabot[bot] 057ee40cb9
chore(deps): Bump thiserror from 1.0.41 to 1.0.43 (#8181)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.41 to 1.0.43.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.41...1.0.43)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 09:25:12 +00:00
Dom a005f344d8
Merge branch 'main' into 7899/wal-disk-metrics 2023-07-06 14:44:11 +01:00
Dom Dwyer d979739576
perf: read disks once, resolve mount point once
Instead of refreshing every metric in the System every 10 seconds,
refresh only the disk statistics for the disk we're interested in.

Additionally resolve the parent disk for the directory path once,
instead of each loop.
2023-07-06 15:33:35 +02:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
wiedld 36e7f53f9b
Merge branch 'main' into 7899/wal-disk-metrics 2023-07-05 13:52:43 -07:00
wiedld a02f7e7f3f chore: rename disk protection to DiskSpaceMetric 2023-07-05 13:47:07 -07:00
wiedld b961bc79c4 refactor: move the background task handler onto the parent IngesterGuard
* follow the pattern of the periodic wal rotation
* do NOT follow the pattern of the wal.flusher_task
2023-07-05 13:13:13 -07:00
wiedld b4b89699cd refactor: make struct signature be (path, registry)
* the metric attributes are hardcoded to the path
* the duration (frequency) of the background task is hardcoded
* the tick.await now occurs after the first metric recording, such that the test doesn't have to wait 15 seconds.
2023-07-05 12:51:23 -07:00
Dom af12edec38
Merge branch 'main' into dom/optimised-partition-pushdown 2023-07-05 15:01:13 +01:00
Fraser Savage 2da99f8032
refactor: Use `const` instead of unnecessary lazy_static
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-07-05 14:42:55 +01:00
Fraser Savage e74a7a7dd4
test(wal): Test correct assignment of write per-partition sequence numbers
This adds extra test coverage for the ingester's WAL replay & RPC write
paths, as well as the WAL E2E tests, to ensure that all sequence numbers
present in a WriteOperation/WalOperation are encoded and present when
decoded.
2023-07-05 14:42:47 +01:00
Fraser Savage e6e09d0c15
feat(ingester): Assign individual sequence numbers for writes per partition
This commit asks the oracle for a new sequence number for each table
batch of a write operation (and thus each partition of a write) when
handling an RPC write operation before appending the operation to the
WAL. The ingester now honours the sequence numbers per-partition when
WAL replay is performed.
2023-07-05 14:29:27 +01:00
Fraser Savage 30939cfe96
refactor(wal): Remove op-level `sequence_number`, use per table map
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
2023-07-05 14:20:43 +01:00
kodiakhq[bot] 70a6e60415
Merge branch 'main' into savage/use-u64-for-sequence-number 2023-07-05 12:55:44 +00:00
Dom Dwyer 7d0e3637ed
perf(ingester): projection pushdown to data source
Prior to this change projection pushdown was implemented as a filter,
which meant a query using it would take the following steps:

    * Query arrives
    * Find necessary partition data
    * Copy all the partition data into a RecordBatch
    * Filter that RecordBatch to apply the projection
    * Return results to caller

This is far from ideal, as the underlying partition data is copied in
its entirety and then the unneeded columns discarded - a pure waste!

After this PR, the projection is pushed down to the point of RecordBatch
generation:

    * Query arrives
    * Find necessary partition data
    * Copy only the projected columns to a RecordBatch
    * Return results to the caller

This minimises the amount of data copying, which for large amounts of
data should lead to a meaningful performance improvement when querying
for a subset of columns. It also uses a slightly more efficient
projection implementation by using a single pass over the columns (still
O(n) but less constant overhead).
2023-07-05 13:44:11 +02:00
Dom Dwyer 226ad2b100
test(ingester): query projection
Add an integration test driving query projection through the ingester.
2023-07-05 13:44:11 +02:00
Dom Dwyer 54a08853fe
test(ingester): split write / query tests
Split the write & query integration tests into their own modules for
clarity.
2023-07-05 13:44:10 +02:00
Dom Dwyer 09974c66db
perf: short-circuit QueryAdaptor row count check
Don't inspect every RecordBatch when checking for at least one row -
stop as soon as 1 row is observed.
2023-07-05 13:44:09 +02:00
Dom Dwyer a17bd3bded
refactor: don't Arc-wrap RecordBatch instances
RecordBatch are internally ref-counted, so don't Arc wrap them again.
2023-07-05 13:44:09 +02:00
Dom Dwyer 8f0ae77184
test(bench): ingester query & projection
Benchmark query performance against a variety of row/column counts, with
and without projection.
2023-07-05 13:44:08 +02:00
dependabot[bot] 3827257f94
chore(deps): Bump thiserror from 1.0.40 to 1.0.41 (#8149)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.40 to 1.0.41.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.40...1.0.41)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-05 09:25:14 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
dependabot[bot] 9a03d9c9fe
chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-04 07:57:41 +00:00
Dom Dwyer 0297fe3651
refactor: less nesting in partition pruning logic
Improve readability by pulling the partition pruning logic into it's own
function and clean up some minor bits.
2023-07-03 17:25:03 +02:00
Dom Dwyer edf6686130
fix(test): custom partitioning template pruning
Configure the partition pruning test to use a partition template that
partitions on the "region" field. This will allow it to be used for
pruning at query time.
2023-07-03 17:25:03 +02:00
Marco Neumann 36ed914689
test: type coercion in ingester tests 2023-07-03 17:25:02 +02:00
Marco Neumann 171b2a14c7
fix: doc link 2023-07-03 17:25:01 +02:00
Marco Neumann e9b456df1f
fix: do not panic for pruning errors 2023-07-03 17:25:00 +02:00
Marco Neumann 0bcf85d48c
refactor: de-dup code 2023-07-03 17:24:59 +02:00
Carol (Nichols || Goulding) 8ebf390d9c
feat: Try to prune ingester partitions by partition key
This is hacktastic.
2023-07-03 17:24:58 +02:00
Fraser Savage da34eb7b35
feat: Load both table name and partition template in the ingester 2023-07-03 17:24:57 +02:00
Fraser Savage 5f759528d3
test(ingester): Add `BufferTree` test for predicate-filtered queries 2023-07-03 17:24:56 +02:00
Fraser Savage 246c2b0749
refactor(ingester): Accept a predicate as parameter to `query_exec`
This will allow the ingester to apply a predicate when serving a query
and only stream back data that satisfies the predicate.
2023-07-03 17:24:56 +02:00
dependabot[bot] 9f00c9c4ef
chore(deps): Bump pin-project from 1.1.1 to 1.1.2 (#8129)
Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/taiki-e/pin-project/releases)
- [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/pin-project/compare/v1.1.1...v1.1.2)

---
updated-dependencies:
- dependency-name: pin-project
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-03 09:10:28 +00:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
wiedld d64a908823
Merge branch 'main' into 7899/wal-disk-metrics 2023-06-30 18:59:49 -07:00
dependabot[bot] ede8e32804
chore(deps): Bump pin-project from 1.1.0 to 1.1.1 (#8118)
Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/taiki-e/pin-project/releases)
- [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/pin-project/compare/v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: pin-project
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-30 08:51:56 +00:00
kodiakhq[bot] 16f18fdd53
Merge branch 'main' into cn/use-the-test-constants-luke 2023-06-29 15:04:53 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00