Commit Graph

8285 Commits (776c34e03d36ebcc0d39a9493116675904e3a46e)

Author SHA1 Message Date
Andrew Lamb 776c34e03d
chore: Update datafusion (#4927)
* chore: Update datafusion

* fix: Update for API changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 09:30:43 +00:00
Andrew Lamb 47def89670
docs: Update tracing.md for NG (#4916)
tracing instructions referred to OG -- update for NG

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 09:24:38 +00:00
Marco Neumann 463d430d43
refactor: do not fetch parquet MD from catalog in querier (#4926)
Ref #4124
2022-06-23 09:03:19 +00:00
Marco Neumann 4b7d02fad1
feat: do not rely on encoded parquet metadata for RB chunks (#4924)
* fix: use proper sort key in tests

* feat: do not rely on encoded parquet metadata for RB chunks

Ref #4124.

* refactor: allocate less strings

* refactor: use upstream PK calculation

* fix: cache expiration w/o a good reason

* refactor: make namespace cache safer to use

* refactor: make partition cache safer to use
2022-06-23 08:55:52 +00:00
Marco Neumann c899c3a0f4
fix: column handling when reading parquet files (#4921)
* fix: column handling when reading parquet files

This improves/fixes/tests a few aspects when reading parquet files:

- fix usage of `Selection::Some(...)`. This was broken since #4912 but
  apparently no test caught that.
- ensure that the order of `Selection::Some(...)` is preserved
- ensure that schema metadata is attached to output batches
- ignore parquet columns that we don't care about (i.e. do not select)
- allow parquet file to have a different column order than our internal
  bookkeeping, this makes it way simpler to read parquet files w/o
  scanning the metadata first
- extend the test coverage

Ref #4124.

* test: even more tests for parquet reader
2022-06-22 13:51:30 +00:00
Marco Neumann 0534b80886
fix: `ParquetFile::size` must include column set (#4925)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-22 13:06:02 +00:00
Marco Neumann 9591bed696
refactor: make querier internals private (#4922)
Queries internals are not meant to be used by other crates. Only a
handful selected interfaces should be used by IOxD and the query tests.

The compactor only used a very small subset just to read parquet files
back into memory. It shall rather use the official `parquet_file`
interface instead.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-22 13:00:08 +00:00
Marco Neumann 751bdce88a
fix: pass write buffer tests w/o Kafka (#4923)
Fixes interaction of `maybe_skip_kafka_integration!` and `should_panic`
by ensuring that `maybe_skip_kafka_integration!` panics to skip
`should_panic` tests.

Without that it is not possible to just run `cargo test -p write_buffer`.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-22 10:41:40 +00:00
dependabot[bot] f7d83ea581
chore(deps): Bump clap from 3.2.5 to 3.2.6 (#4920)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.5 to 3.2.6.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.5...v3.2.6)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-22 10:28:44 +00:00
dependabot[bot] 50ffca791a
chore(deps): Bump indexmap from 1.9.0 to 1.9.1 (#4919)
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.0 to 1.9.1.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.9.0...1.9.1)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-22 10:18:06 +00:00
kodiakhq[bot] 95bd8b41e4
Merge pull request #4914 from influxdata/dom/ingester-uses-partition-key
refactor: use propagated partition key in ingester
2022-06-22 09:53:44 +00:00
kodiakhq[bot] aff5e6d69a
Merge branch 'main' into dom/ingester-uses-partition-key 2022-06-22 09:47:49 +00:00
Andrew Lamb 087dbd3eca
fix: fix heappy + update docs (#4917)
* docs: Update heap profiling documentation

* fix: fix heappy builds

* fix: do not run cli tests with heappy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-21 19:53:28 +00:00
Marco Neumann 59accfe862
refactor: assorted fixes and prep work for #4124 (#4912)
* refactor: `TestPartition::update_sort_key` should return an `Arc`

The whole test framework is built around `Arc`s, so let's fix this
consistency issue.

* fix: actually calculate correct column set in test framework

* feat: check expected parquet file schema

While working on the querier I made some mistakes regarding schemas and
such a check would have greatly improved the debugging experience.

* feat: namespace cache expiration

* fix: improve parquet schema check

* fix: remove clone
2022-06-21 16:08:28 +00:00
Dom Dwyer 75a3fd5e1e refactor: use propagated partition key in ingester
Changes the ingester to use the partition key derived in the router, and
transmitted over through the kafka API boundary.

This should have no observable behavioural change, but be more resilient
as we're no longer assuming the partitioning algorithm produces the same
value in both the router (where data is partitioned) and the ingester
(where data is persisted, segregated by partition key).

This is a pre-requisite to allowing the user to specify partitioning
schemes.
2022-06-21 15:57:30 +01:00
Marco Neumann 70337087a8
refactor: do not require parquet metadata for RB cache (#4911)
* test: add `TestParquetFile::schema`

* refactor: do not require parquet metadata for RB cache

Ref #4124.
2022-06-21 12:59:23 +00:00
Marco Neumann db24838221
refactor: remove table name from read buffer (#4910)
The low-level chunk storage shouldn't care about the table name (this is
also true for parquet chunks btw). In fact, the table name is already
only a partial information since it misses the namespace.

If we need a table name, then the high-level chunk/data management is
responsible for that.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-21 11:57:28 +00:00
Marco Neumann 0f63be26c3
refactor: pass path instead of metadata around to load parquet files (#4909) 2022-06-21 10:57:10 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Dom 4df710a205
Merge pull request #4907 from influxdata/dom/propagate-partition-key
feat: propagate partition key through kafka
2022-06-20 14:07:50 +01:00
Dom Dwyer c1f7154031 feat: propagate partition key through kafka
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.

After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.

A follow-up PR will change the ingester to utilise this embedded
partition key.

This has the unfortunate side effect of making the partition key part of
the public gRPC write API:

    https://github.com/influxdata/influxdb_iox/issues/4866
2022-06-20 13:42:51 +01:00
Marco Neumann 1962fcc229
chore: reduce dependencies and run `cargo update` (#4906)
* chore: reduce proptest features

* chore: remove `grpc-router`

This crate is currently unused and we don't have immediate plans to use
it. And there's GIT, so it can always be restored.

* chore: `cargo update`
2022-06-20 12:18:28 +00:00
Marco Neumann 730f85a619
refactor(querier): split ingester partitions into chunks (#4893)
* refactor(querier): split ingester partitions into chunks

With the new wire protocol the ingester can now transmit multiple
snapshots per partition with different schemas. This changes the querier
to reflect this and and splits uses the individual snapshots as chunks
for the query engine instead of a single partition.

The schema handling was changed so that instead of a table-wide schema
enforcement, we now use the snapshot-specific projections. This means we
do not need to create all-NULL columns any longer because the batches
within the chunks now always have the correct schema.

* refactor: "disassembler" -> "decoder"
2022-06-20 08:58:58 +00:00
dependabot[bot] cc324613cc
chore(deps): Bump syn from 1.0.96 to 1.0.98 (#4905)
Bumps [syn](https://github.com/dtolnay/syn) from 1.0.96 to 1.0.98.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](https://github.com/dtolnay/syn/compare/1.0.96...1.0.98)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-20 08:49:02 +00:00
Andrew Lamb f151b1e89f
fix: categorize `NamespaceNotFound` as ingester not found errors as well (#4899)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-20 08:40:31 +00:00
dependabot[bot] 2ddc78bc41
chore(deps): Bump uuid from 1.0.0 to 1.1.2 (#4904)
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.0.0 to 1.1.2.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](https://github.com/uuid-rs/uuid/compare/1.0.0...1.1.2)

---
updated-dependencies:
- dependency-name: uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-20 08:34:33 +00:00
dependabot[bot] 8b28721627
chore(deps): Bump tower from 0.4.12 to 0.4.13 (#4903)
Bumps [tower](https://github.com/tower-rs/tower) from 0.4.12 to 0.4.13.
- [Release notes](https://github.com/tower-rs/tower/releases)
- [Commits](https://github.com/tower-rs/tower/compare/tower-0.4.12...tower-0.4.13)

---
updated-dependencies:
- dependency-name: tower
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-20 08:25:29 +00:00
kodiakhq[bot] bf6fec2447
Merge pull request #4898 from influxdata/dom/timestamps
test: assert backfill partitioning in router
2022-06-17 13:51:14 +00:00
kodiakhq[bot] 5786245698
Merge branch 'main' into dom/timestamps 2022-06-17 13:45:30 +00:00
Dom a8c9638e89
Merge branch 'main' into dom/timestamps 2022-06-17 14:45:27 +01:00
Nga Tran 72c8cfa6ed
fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888)
* fix: make ChunkOrder u64 data type to accept min sequence number 0

* fix: make ChunkOrder i64 to match with sequence number type

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-17 13:45:17 +00:00
Dom Dwyer 2cc2ad6887 test: assert backfill partitioning in router
Assert writes in the router that cover multiple partitions are correctly
split up.
2022-06-17 14:40:53 +01:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
dependabot[bot] d9ab157797
chore(deps): Bump indexmap from 1.8.2 to 1.9.0 (#4891)
* chore(deps): Bump indexmap from 1.8.2 to 1.9.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.8.2 to 1.9.0.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.8.2...1.9.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-17 07:42:36 +00:00
Nga Tran 3ca74744bf
chore: debug info about sequence number while it gets converted into ChunkOrder (#4884) 2022-06-16 18:40:55 +00:00
Nga Tran d57b0eb1fa
chore: more info for i64-to-u128 panic message (#4881)
* chore: more info for i64-to-u128 panic message

* chore: Apply suggestions from code review

Co-authored-by: Dom <dom@itsallbroken.com>

* chore: fix fmt

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 15:49:43 +00:00
Marco Neumann c6bffac5d3
refactor: make querier->ingester request metrics per-ingester (#4879)
The metrics and logs introduced in #4806 will be emitted once for all
ingesters instead of per request. The accumulated view makes it pretty
hard to judge the actual request-response timings and the number of
requests.

Instead we now measure the data per request.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 15:09:47 +00:00
Dom 9122850fe5
Merge pull request #4878 from influxdata/dom/fix-dml-aggregation
fix: respect partition key when batching dml ops
2022-06-16 16:00:09 +01:00
Dom Dwyer 43b3f22411 fix: respect partition key when batching dml ops
This commit changes the kafka write aggregator to only merge DML ops
destined for the same partition.

Prior to this commit the aggregator was merging DML ops that had
different partition keys, causing data to be persisted in incorrect
partitions:

    https://github.com/influxdata/influxdb_iox/issues/4787
2022-06-16 14:05:32 +01:00
Marco Neumann 743c1692ea
refactor: stream query results from ingester to querier (#4875)
* refactor: stream partitions from ingester

Ref #4849.

* refactor: do not collect record batched on the ingester side

Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:58:50 +00:00
Andrew Lamb d67336fd69
fix(ingester): ensure all ingester metrics are prefixed with `ingester_` (#4871)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:52:35 +00:00
Andrew Lamb 74f4006580
fix(ingester): make ingester metrics start with `ingester` (#4870)
* fix(ingester): make ingester metrics start with `ingester`

* fix: Update ingester/src/stream_handler/handler.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:46:37 +00:00
Andrew Lamb 8c56909218
fix(ingester): Distinguish between "not found" and other flight errors (#4874)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:39:37 +00:00
Marco Neumann 827d869658
feat: instrument semaphore "cancelled while pending" requests (#4876)
This is useful to see how many requests timed out while waiting for a
semaphore.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:33:39 +00:00
Marco Neumann 4b945493be
test: test gRPC and stream flattening (#4873)
Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 11:44:59 +00:00
dependabot[bot] 73e1b3a47e
chore(deps): Bump clap from 3.2.4 to 3.2.5 (#4872)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.4 to 3.2.5.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.4...v3.2.5)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-16 09:27:04 +00:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
Andrew Lamb 6b771375bf
feat: log when partitions are written due to going over size (#4868) 2022-06-15 20:12:43 +00:00
kodiakhq[bot] bf39649d64
Merge pull request #4835 from influxdata/cn/talk-to-ingesters-less
feat: Make a sharder available in the querier
2022-06-15 17:48:40 +00:00
kodiakhq[bot] fa9a094068
Merge branch 'main' into cn/talk-to-ingesters-less 2022-06-15 17:42:40 +00:00