Commit Graph

8274 Commits (aff5e6d69a839e99eb553bad04152acb57998e2c)

Author SHA1 Message Date
kodiakhq[bot] aff5e6d69a
Merge branch 'main' into dom/ingester-uses-partition-key 2022-06-22 09:47:49 +00:00
Andrew Lamb 087dbd3eca
fix: fix heappy + update docs (#4917)
* docs: Update heap profiling documentation

* fix: fix heappy builds

* fix: do not run cli tests with heappy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-21 19:53:28 +00:00
Marco Neumann 59accfe862
refactor: assorted fixes and prep work for #4124 (#4912)
* refactor: `TestPartition::update_sort_key` should return an `Arc`

The whole test framework is built around `Arc`s, so let's fix this
consistency issue.

* fix: actually calculate correct column set in test framework

* feat: check expected parquet file schema

While working on the querier I made some mistakes regarding schemas and
such a check would have greatly improved the debugging experience.

* feat: namespace cache expiration

* fix: improve parquet schema check

* fix: remove clone
2022-06-21 16:08:28 +00:00
Dom Dwyer 75a3fd5e1e refactor: use propagated partition key in ingester
Changes the ingester to use the partition key derived in the router, and
transmitted over through the kafka API boundary.

This should have no observable behavioural change, but be more resilient
as we're no longer assuming the partitioning algorithm produces the same
value in both the router (where data is partitioned) and the ingester
(where data is persisted, segregated by partition key).

This is a pre-requisite to allowing the user to specify partitioning
schemes.
2022-06-21 15:57:30 +01:00
Marco Neumann 70337087a8
refactor: do not require parquet metadata for RB cache (#4911)
* test: add `TestParquetFile::schema`

* refactor: do not require parquet metadata for RB cache

Ref #4124.
2022-06-21 12:59:23 +00:00
Marco Neumann db24838221
refactor: remove table name from read buffer (#4910)
The low-level chunk storage shouldn't care about the table name (this is
also true for parquet chunks btw). In fact, the table name is already
only a partial information since it misses the namespace.

If we need a table name, then the high-level chunk/data management is
responsible for that.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-21 11:57:28 +00:00
Marco Neumann 0f63be26c3
refactor: pass path instead of metadata around to load parquet files (#4909) 2022-06-21 10:57:10 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Dom 4df710a205
Merge pull request #4907 from influxdata/dom/propagate-partition-key
feat: propagate partition key through kafka
2022-06-20 14:07:50 +01:00
Dom Dwyer c1f7154031 feat: propagate partition key through kafka
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.

After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.

A follow-up PR will change the ingester to utilise this embedded
partition key.

This has the unfortunate side effect of making the partition key part of
the public gRPC write API:

    https://github.com/influxdata/influxdb_iox/issues/4866
2022-06-20 13:42:51 +01:00
Marco Neumann 1962fcc229
chore: reduce dependencies and run `cargo update` (#4906)
* chore: reduce proptest features

* chore: remove `grpc-router`

This crate is currently unused and we don't have immediate plans to use
it. And there's GIT, so it can always be restored.

* chore: `cargo update`
2022-06-20 12:18:28 +00:00
Marco Neumann 730f85a619
refactor(querier): split ingester partitions into chunks (#4893)
* refactor(querier): split ingester partitions into chunks

With the new wire protocol the ingester can now transmit multiple
snapshots per partition with different schemas. This changes the querier
to reflect this and and splits uses the individual snapshots as chunks
for the query engine instead of a single partition.

The schema handling was changed so that instead of a table-wide schema
enforcement, we now use the snapshot-specific projections. This means we
do not need to create all-NULL columns any longer because the batches
within the chunks now always have the correct schema.

* refactor: "disassembler" -> "decoder"
2022-06-20 08:58:58 +00:00
dependabot[bot] cc324613cc
chore(deps): Bump syn from 1.0.96 to 1.0.98 (#4905)
Bumps [syn](https://github.com/dtolnay/syn) from 1.0.96 to 1.0.98.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](https://github.com/dtolnay/syn/compare/1.0.96...1.0.98)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-20 08:49:02 +00:00
Andrew Lamb f151b1e89f
fix: categorize `NamespaceNotFound` as ingester not found errors as well (#4899)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-20 08:40:31 +00:00
dependabot[bot] 2ddc78bc41
chore(deps): Bump uuid from 1.0.0 to 1.1.2 (#4904)
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.0.0 to 1.1.2.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](https://github.com/uuid-rs/uuid/compare/1.0.0...1.1.2)

---
updated-dependencies:
- dependency-name: uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-20 08:34:33 +00:00
dependabot[bot] 8b28721627
chore(deps): Bump tower from 0.4.12 to 0.4.13 (#4903)
Bumps [tower](https://github.com/tower-rs/tower) from 0.4.12 to 0.4.13.
- [Release notes](https://github.com/tower-rs/tower/releases)
- [Commits](https://github.com/tower-rs/tower/compare/tower-0.4.12...tower-0.4.13)

---
updated-dependencies:
- dependency-name: tower
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-20 08:25:29 +00:00
kodiakhq[bot] bf6fec2447
Merge pull request #4898 from influxdata/dom/timestamps
test: assert backfill partitioning in router
2022-06-17 13:51:14 +00:00
kodiakhq[bot] 5786245698
Merge branch 'main' into dom/timestamps 2022-06-17 13:45:30 +00:00
Dom a8c9638e89
Merge branch 'main' into dom/timestamps 2022-06-17 14:45:27 +01:00
Nga Tran 72c8cfa6ed
fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888)
* fix: make ChunkOrder u64 data type to accept min sequence number 0

* fix: make ChunkOrder i64 to match with sequence number type

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-17 13:45:17 +00:00
Dom Dwyer 2cc2ad6887 test: assert backfill partitioning in router
Assert writes in the router that cover multiple partitions are correctly
split up.
2022-06-17 14:40:53 +01:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
dependabot[bot] d9ab157797
chore(deps): Bump indexmap from 1.8.2 to 1.9.0 (#4891)
* chore(deps): Bump indexmap from 1.8.2 to 1.9.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.8.2 to 1.9.0.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.8.2...1.9.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-17 07:42:36 +00:00
Nga Tran 3ca74744bf
chore: debug info about sequence number while it gets converted into ChunkOrder (#4884) 2022-06-16 18:40:55 +00:00
Nga Tran d57b0eb1fa
chore: more info for i64-to-u128 panic message (#4881)
* chore: more info for i64-to-u128 panic message

* chore: Apply suggestions from code review

Co-authored-by: Dom <dom@itsallbroken.com>

* chore: fix fmt

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 15:49:43 +00:00
Marco Neumann c6bffac5d3
refactor: make querier->ingester request metrics per-ingester (#4879)
The metrics and logs introduced in #4806 will be emitted once for all
ingesters instead of per request. The accumulated view makes it pretty
hard to judge the actual request-response timings and the number of
requests.

Instead we now measure the data per request.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 15:09:47 +00:00
Dom 9122850fe5
Merge pull request #4878 from influxdata/dom/fix-dml-aggregation
fix: respect partition key when batching dml ops
2022-06-16 16:00:09 +01:00
Dom Dwyer 43b3f22411 fix: respect partition key when batching dml ops
This commit changes the kafka write aggregator to only merge DML ops
destined for the same partition.

Prior to this commit the aggregator was merging DML ops that had
different partition keys, causing data to be persisted in incorrect
partitions:

    https://github.com/influxdata/influxdb_iox/issues/4787
2022-06-16 14:05:32 +01:00
Marco Neumann 743c1692ea
refactor: stream query results from ingester to querier (#4875)
* refactor: stream partitions from ingester

Ref #4849.

* refactor: do not collect record batched on the ingester side

Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:58:50 +00:00
Andrew Lamb d67336fd69
fix(ingester): ensure all ingester metrics are prefixed with `ingester_` (#4871)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:52:35 +00:00
Andrew Lamb 74f4006580
fix(ingester): make ingester metrics start with `ingester` (#4870)
* fix(ingester): make ingester metrics start with `ingester`

* fix: Update ingester/src/stream_handler/handler.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:46:37 +00:00
Andrew Lamb 8c56909218
fix(ingester): Distinguish between "not found" and other flight errors (#4874)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:39:37 +00:00
Marco Neumann 827d869658
feat: instrument semaphore "cancelled while pending" requests (#4876)
This is useful to see how many requests timed out while waiting for a
semaphore.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:33:39 +00:00
Marco Neumann 4b945493be
test: test gRPC and stream flattening (#4873)
Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 11:44:59 +00:00
dependabot[bot] 73e1b3a47e
chore(deps): Bump clap from 3.2.4 to 3.2.5 (#4872)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.4 to 3.2.5.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.4...v3.2.5)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-16 09:27:04 +00:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
Andrew Lamb 6b771375bf
feat: log when partitions are written due to going over size (#4868) 2022-06-15 20:12:43 +00:00
kodiakhq[bot] bf39649d64
Merge pull request #4835 from influxdata/cn/talk-to-ingesters-less
feat: Make a sharder available in the querier
2022-06-15 17:48:40 +00:00
kodiakhq[bot] fa9a094068
Merge branch 'main' into cn/talk-to-ingesters-less 2022-06-15 17:42:40 +00:00
Dom 9a774abc70
Merge pull request #4864 from influxdata/dom/partition-key-dml
refactor: add partition key to DmlWrite
2022-06-15 17:44:56 +01:00
Carol (Nichols || Goulding) 8331cb1afe
fix: Add retry to querying of catalog for sequencers in querier startup 2022-06-15 12:09:42 -04:00
Carol (Nichols || Goulding) 03f6f59a9b
fix: Change the sharder to return error instead of panicking for no shards 2022-06-15 11:23:31 -04:00
Dom Dwyer 4df2964566 refactor: store PartitionKey in DmlWrite
Carry the PartitionKey in the DmlWrite, allowing the batch to be
associated with a specific partition key.
2022-06-15 15:48:54 +01:00
Dom Dwyer 0da8ec87d5 refactor: always generate a partition key
Changes the partitioner to always generate a partition key, even if the
column being used to partition doesn't exist. This doesn't functionally
change the batch partitioning output, but ensures we always have a
non-empty string for the partition key.
2022-06-15 15:38:02 +01:00
Dom Dwyer 61182f506b refactor: emit PartitionKey from partitioner
Changes the partitioning code to emit a PartitionKey, instead of a bare
String.
2022-06-15 15:38:02 +01:00
Marco Neumann 7c60edd38c
refactor: prepare new ingester<>querier protocol on the querier side (#4863)
* refactor: prepare new ingester<>querier protocol on the querier side

This changes the querier internals to work with the new protocol. The
wire protocol stays the same (for now). There's a (somewhat hackish)
adapter in place on the querier side that converts the old to the new
protocol on-the-fly. This is an intermediate step before we actually
change the wire protocol (and in a step after that also take advantage
of the new possibilites on the ingester side).

Ref #4849.

* docs: explain adapter
2022-06-15 14:32:24 +00:00
Carol (Nichols || Goulding) e9cdaffe74
fix: Create querier sharder from catalog sequencer info
Panic if there are no sharders in the catalog.
2022-06-15 10:18:54 -04:00
Carol (Nichols || Goulding) 553590fb23
fix: Move deps to dev-deps 2022-06-15 10:01:45 -04:00
Carol (Nichols || Goulding) 874ef89daa
feat: Make specifying the write buffer, and thus getting a sharder, optional in querier 2022-06-15 10:01:45 -04:00
Carol (Nichols || Goulding) 127467b5c4
feat: Create a sharder in the querier 2022-06-15 10:01:45 -04:00