Commit Graph

639 Commits (60cbf5308713dfa136a759a7ce8bf2b4bd1395d6)

Author SHA1 Message Date
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins ()
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
dependabot[bot] 7b6efae62c
chore(deps): Bump tempfile from 3.5.0 to 3.6.0
Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.5.0 to 3.6.0.
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/compare/v3.5.0...v3.6.0)

---
updated-dependencies:
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-07 08:21:40 +00:00
dependabot[bot] ee61e954bf
chore(deps): Bump flatbuffers from 23.1.21 to 23.5.26 ()
Bumps [flatbuffers](https://github.com/google/flatbuffers) from 23.1.21 to 23.5.26.
- [Release notes](https://github.com/google/flatbuffers/releases)
- [Changelog](https://github.com/google/flatbuffers/blob/master/CHANGELOG.md)
- [Commits](https://github.com/google/flatbuffers/compare/v23.1.21...v23.5.26)

---
updated-dependencies:
- dependency-name: flatbuffers
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 09:41:08 +00:00
dependabot[bot] d8b06c59c4
chore(deps): Bump once_cell from 1.17.2 to 1.18.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.17.2 to 1.18.0.
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.17.2...v1.18.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-05 02:03:15 +00:00
wiedld 2d2c3d5f8b
chore(idpe-17592): DeferredLoad metric counts () 2023-06-02 10:56:39 -07:00
Marco Neumann fa5011197c
refactor: migrate `iox_query` to use DataFusion statistics ()
This is the major part of . Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-02 09:18:59 +00:00
Andrew Lamb a48f681e56
feat(parquet): reduce and limit buffering when writing parquet files ()
* feat: limit buffering when writing parquet files ("combined solution")

* chore: Run cargo hakari tasks

---------

Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-31 13:27:32 +00:00
Dom 8f6308fca3
Merge branch 'main' into dom/frame-docs 2023-05-29 09:59:27 +01:00
Andrew Lamb 1ff76b7bf2 chore: use workspace dependencies for `object_store` 2023-05-26 07:03:42 -04:00
Dom Dwyer ed276bdc73
docs: reflink RecordBatch 2023-05-26 12:07:05 +02:00
Dom Dwyer c4691b04e4
docs: describe what the spans capture
A short description on the FlightFrameEncodeRecorder that helps people
understand exactly what the spans cover - it's likely people will wind
up looking at this code after debugging an issue in a trace, so lets
make sure we give them as much helpful context as possible!
2023-05-26 11:46:45 +02:00
wiedld 7bcde3c544
chore(7618): trace ingester response encoding v2 ()
* test: integration test for tracing of queries to the ingester

* chore: add FlightFrameEncodeRecorder to record spans per each polling result

* refactor(trace): impl TraceCollector for Arc

Allow any Arc-wrapped TraceCollector implementation to be used as a
TraceCollector. This avoids needing to as_any() and downcast later.

* test: assert FlightFrameEncodeRecorder trace spans

This test exercises the FlightDataEncoder wrapped with the trace
decorator (FlightFrameEncodeRecorder) when executing against a data
source that yields data after varying numbers of Stream polls.

This test passing will validate the FlightFrameEncodeRecorder correctly
instruments the amount of time a client spends waiting on the
FlightDataEncoder to acquire or encode a protocol frame, but also
ensures the decorator correctly accounts for varying behaviours allowed
through the Stream abstraction. It does this by simulating a data source
that is not always immediately ready to provide data, such as a buffer
wrapped in a contended async mutex.

* refactor: move tracing decorator into separate mod

* fix: record spans

* refactor(test): update test

The frame encoder is not one-to-one - it emits two frames for the first
data payload, a schema and a payload. This commit updates the test to
account for it!

* refactor: remove unneeded mut ref, and use enum state method which panics when in a (should be unreachable) state

* chore: add more docs to FlightFrameEncodeRecorder and related

---------

Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-26 09:40:16 +00:00
Carol (Nichols || Goulding) 9c0faa66f0
feat: Set a table partition template explicitly or from the namespace
And use the table partition template when partitioning writes to that
table.
2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) 604bab9508
fix: Make Table create_or_get be only create 2023-05-24 10:34:30 -04:00
dependabot[bot] b7fbfa6fb2
chore(deps): Bump criterion from 0.4.0 to 0.5.0 ()
Bumps [criterion](https://github.com/bheisler/criterion.rs) from 0.4.0 to 0.5.0.
- [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0)

---
updated-dependencies:
- dependency-name: criterion
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 09:08:37 +00:00
Marco Neumann 6729b5681a
fix(ingester): re-transmit schema over flight if it changes ()
* fix(ingester): re-transmit schema over flight if it changes

Fixes https://github.com/influxdata/idpe/issues/17408 .

So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME
schema. When the ingester crafts a response for a specific partition,
this is also almost always the case however when there's a persist job
running (I think) it may have multiple snapshots for a partition. These
snapshots may have different schemas (since the ingester only creates
columns if the contain any data). Now the current implementation munches
all these snapshots into a single stream, and hands them over to arrow
flight which has a high-perf encode routine (i.e. it does not re-check
every single schema) so it sends the schema once and then sends the data
for every batch (the data only, schema data is NOT repeated). On the
receiver side (= querier) we decode that data and get confused why on
earth some batches have a different column count compared to the schema.

For the OG ingester I carefully crafted the response to ensure that we
do not run into this problem, but apparently a number of rewrites and
refactors broke that. So here is the fix:

- remove the stream that isn't really as stream (and cannot error)
- for each partition go over the `RecordBatch`es and chunk them
  according to the schema (because this check is likely cheaper than
  re-transmitting the schema for every `RecordBatch`)
- adjust a bunch of testing code to cope with this

* refactor: nicify code

* test: adjust test
2023-05-23 14:27:11 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Marco Neumann b2ff90de63
test: regression test for ()
Regression test that  will fix.
2023-05-23 12:43:04 +00:00
Carol (Nichols || Goulding) 388f55c741
fix: Format ingester code that rustfmt hasn't been seeing 2023-05-22 18:07:07 -04:00
Carol (Nichols || Goulding) e4e0539f2f
fix: Conditionally re-export rather than declaring through a macro
Because of https://github.com/rust-lang/rustfmt/issues/3253,
declaring modules within macros results in rustfmt not seeing those modules.
2023-05-22 18:07:07 -04:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` ()
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Dom Dwyer 0a31afd00d
fix: correct time range buckets
Turns out there are 60 seconds in a minute, not 3,600.
2023-05-16 17:47:34 +02:00
Dom Dwyer 3514667a04
docs: fix comments
Fix two lines.
2023-05-16 15:25:47 +02:00
Dom Dwyer 9b211df053
test(ingester): persist & persistence metrics
Adds a test that asserts (manually triggered) persistence generates a
file, uploads it to object storage, inserts metadata into the catalog,
and emits various persistence metrics.
2023-05-16 14:20:30 +02:00
Dom Dwyer 74210b6257
refactor(ingester): emit Parquet file metrics
Register the ParquetFileInstrumentation as a PersistCompletionObserver
in the persist subsystem.
2023-05-16 14:20:30 +02:00
Dom Dwyer 3114c67cf1
feat: persisted Parquet file attribute metrics
Implements a PersistCompletionObserver that records various attributes
of the generated and persisted Parquet file as histogram metrics to
capture the distribution of values:

    * File size
    * Row count
    * Column count
    * Time range of data (max - min timestamp)

These metrics will give us insight into the generated files instead of
relying on intuition when tuning various configuration parameters.
2023-05-16 14:20:29 +02:00
Dom Dwyer 507ccc2eb5
refactor: parquet metadata in persist notification
Changes the CompletedPersist notification data structure to embed the
generated parquet file's metadata for completion observers.
2023-05-16 14:20:29 +02:00
Dom 06a2345708
Merge branch 'main' into dependabot/cargo/uuid-1.3.3 2023-05-16 10:36:46 +01:00
dependabot[bot] 3462e29859
chore(deps): Bump uuid from 1.3.2 to 1.3.3
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.3.2 to 1.3.3.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](https://github.com/uuid-rs/uuid/compare/1.3.2...1.3.3)

---
updated-dependencies:
- dependency-name: uuid
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-16 02:00:24 +00:00
Carol (Nichols || Goulding) 7268ea5c29
refactor: Extract a test helper function to create a basic table 2023-05-15 14:31:24 -04:00
Carol (Nichols || Goulding) 57bedb1c2d
refactor: Extract a test helper function to create a basic namespace 2023-05-15 14:20:38 -04:00
Dom 6aa634c1b9
Merge branch 'main' into cn/move-peas 2023-05-15 13:29:42 +01:00
Kaya Gökalp 5fe8affb18
refactor: accept NamespaceName with Namespace create ()
Co-authored-by: Dom <dom@itsallbroken.com>
2023-05-15 10:03:55 +00:00
dependabot[bot] fba9836f2a
chore(deps): Bump pin-project from 1.0.12 to 1.1.0
Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.0.12 to 1.1.0.
- [Release notes](https://github.com/taiki-e/pin-project/releases)
- [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/pin-project/compare/v1.0.12...v1.1.0)

---
updated-dependencies:
- dependency-name: pin-project
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-15 02:02:32 +00:00
Carol (Nichols || Goulding) 1770d0f4d8
fix: Move ingester-querier gRPC communication to its own crate 2023-05-12 13:28:30 -04:00
Carol (Nichols || Goulding) e60f703e95
fix: Rename router2 to router
Including an alias and a test for continuing to support `influxdb_iox
run router2`.
2023-05-09 22:01:39 -04:00
Carol (Nichols || Goulding) 596673d515
refactor: Create a new ColumnsByName type to abstract over TableSchema columns
And allow usage of just the columns when that's all that's needed
without leaking the BTreeMap implementation detail everywhere
2023-05-09 14:54:58 +02:00
Dom 372ec8ef96
Merge branch 'main' into cn/delete-experiments 2023-05-09 10:17:30 +01:00
Carol (Nichols || Goulding) 6506dd25a0
fix: Remove vestiges of topic 2023-05-08 20:24:56 -04:00
Carol (Nichols || Goulding) 0849ce6f2b
fix: Rename ingester2_test_ctx to ingester_test_ctx 2023-05-08 20:23:02 -04:00
Carol (Nichols || Goulding) 56916cf942
fix: Rename ingester2 to ingester 2023-05-08 12:03:05 -04:00
Carol (Nichols || Goulding) 6c2ce01f1e
fix: Remove old ingester and ioxd_ingester 2023-04-07 11:06:37 -04:00
Marco Neumann 5f43f2a719
refactor: remove old query planning code ()
Closes .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-06 16:05:08 +00:00
dependabot[bot] 66982f988b
chore(deps): Bump object_store from 0.5.5 to 0.5.6 ()
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/commits)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 08:43:34 +00:00
Carol (Nichols || Goulding) 9a27736c65
docs: Fix some typos 2023-03-31 12:44:12 -04:00
dependabot[bot] 4eedb7ea77
chore(deps): Bump async-trait from 0.1.66 to 0.1.68 ()
* chore(deps): Bump async-trait from 0.1.66 to 0.1.68

Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-03-30 10:14:36 +00:00
dependabot[bot] 9cbcdc7672
chore(deps): Bump tokio from 1.26.0 to 1.27.0 ()
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.26.0...tokio-1.27.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-30 09:36:04 +00:00
Marco Neumann 20ec47b00b
feat: virtual chunk order col ()
* feat: introduce `CHUNK_ORDER_COLUMN_NAME`

* feat: impl `ChunkOrder` everywhere

* feat: `ChunkOrder::get`

* feat: emit chunk order column for `RecordBatchesExec`

* feat: `chunk_order_field`

* feat: chunk order col for parquet chunks

* feat: optional chunk order col handling for dedup

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 09:39:21 +00:00
Carol (Nichols || Goulding) cc7c44f76a
chore: Upgrade to Rust 1.68 ()
* chore: Upgrade to Rust 1.68

* fix: Remove unnecessary into_iter, thanks Clippy!

* fix: Use the size of the type, not a reference to the type... oops.

Thanks clippy!

* fix: Return block directly instead of creating a variable

Thanks clippy!

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-12 13:22:20 +00:00
dependabot[bot] 3689827793
chore(deps): Bump paste from 1.0.11 to 1.0.12 ()
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.11 to 1.0.12.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.11...1.0.12)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-06 10:40:41 +00:00