Commit Graph

222 Commits (59accfe862e12d96e8834f3a9dfad60189c6e4ab)

Author SHA1 Message Date
Marco Neumann 59accfe862
refactor: assorted fixes and prep work for #4124 (#4912)
* refactor: `TestPartition::update_sort_key` should return an `Arc`

The whole test framework is built around `Arc`s, so let's fix this
consistency issue.

* fix: actually calculate correct column set in test framework

* feat: check expected parquet file schema

While working on the querier I made some mistakes regarding schemas and
such a check would have greatly improved the debugging experience.

* feat: namespace cache expiration

* fix: improve parquet schema check

* fix: remove clone
2022-06-21 16:08:28 +00:00
Marco Neumann 70337087a8
refactor: do not require parquet metadata for RB cache (#4911)
* test: add `TestParquetFile::schema`

* refactor: do not require parquet metadata for RB cache

Ref #4124.
2022-06-21 12:59:23 +00:00
Marco Neumann db24838221
refactor: remove table name from read buffer (#4910)
The low-level chunk storage shouldn't care about the table name (this is
also true for parquet chunks btw). In fact, the table name is already
only a partial information since it misses the namespace.

If we need a table name, then the high-level chunk/data management is
responsible for that.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-21 11:57:28 +00:00
Marco Neumann 0f63be26c3
refactor: pass path instead of metadata around to load parquet files (#4909) 2022-06-21 10:57:10 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Marco Neumann 730f85a619
refactor(querier): split ingester partitions into chunks (#4893)
* refactor(querier): split ingester partitions into chunks

With the new wire protocol the ingester can now transmit multiple
snapshots per partition with different schemas. This changes the querier
to reflect this and and splits uses the individual snapshots as chunks
for the query engine instead of a single partition.

The schema handling was changed so that instead of a table-wide schema
enforcement, we now use the snapshot-specific projections. This means we
do not need to create all-NULL columns any longer because the batches
within the chunks now always have the correct schema.

* refactor: "disassembler" -> "decoder"
2022-06-20 08:58:58 +00:00
Nga Tran 72c8cfa6ed
fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888)
* fix: make ChunkOrder u64 data type to accept min sequence number 0

* fix: make ChunkOrder i64 to match with sequence number type

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-17 13:45:17 +00:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
Marco Neumann c6bffac5d3
refactor: make querier->ingester request metrics per-ingester (#4879)
The metrics and logs introduced in #4806 will be emitted once for all
ingesters instead of per request. The accumulated view makes it pretty
hard to judge the actual request-response timings and the number of
requests.

Instead we now measure the data per request.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 15:09:47 +00:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
kodiakhq[bot] fa9a094068
Merge branch 'main' into cn/talk-to-ingesters-less 2022-06-15 17:42:40 +00:00
Carol (Nichols || Goulding) 8331cb1afe
fix: Add retry to querying of catalog for sequencers in querier startup 2022-06-15 12:09:42 -04:00
Carol (Nichols || Goulding) 03f6f59a9b
fix: Change the sharder to return error instead of panicking for no shards 2022-06-15 11:23:31 -04:00
Marco Neumann 7c60edd38c
refactor: prepare new ingester<>querier protocol on the querier side (#4863)
* refactor: prepare new ingester<>querier protocol on the querier side

This changes the querier internals to work with the new protocol. The
wire protocol stays the same (for now). There's a (somewhat hackish)
adapter in place on the querier side that converts the old to the new
protocol on-the-fly. This is an intermediate step before we actually
change the wire protocol (and in a step after that also take advantage
of the new possibilites on the ingester side).

Ref #4849.

* docs: explain adapter
2022-06-15 14:32:24 +00:00
Carol (Nichols || Goulding) e9cdaffe74
fix: Create querier sharder from catalog sequencer info
Panic if there are no sharders in the catalog.
2022-06-15 10:18:54 -04:00
Carol (Nichols || Goulding) 874ef89daa
feat: Make specifying the write buffer, and thus getting a sharder, optional in querier 2022-06-15 10:01:45 -04:00
Marco Neumann 3bd24b67ba
feat: extend flight client to accept multiple (changing) schemas (#4853)
* feat: extend flight client to accept multiple (changing) schemas

See #4849.

Originally I intended not to use Flight at all for the new
ingester<>querier protocol. However since flight also deals with
dictionary batches and multiple batches and the gRPC protocol that I
would write would look very similar, I will use Flight with a bit more
flexible message types.

The rough idea for the protocol is the following stream:

- for each partition:
  1. "none" message with partition metadata
  2. for each chunk (can have different schemas under certain
     circumstances):
     1. "schema" message (resets dictionary state)
     2. (optional) dictionary batch messages
     3. one or more "record batch" message

The nice thing about it is that the same arrow client works also for the
existing client<>querier protocol since there we just send:

1. "schema" message (no app metadata)
2. (optional) dictionary batch messages
3. zero, one or more "record batch" message (no app metadata)

* refactor: separate high- and low-level flight client

It is very unlikely that a user will use the high-level batch-producing
functionality and the low-level stuff within the same session. So let's
split this into to clients (high-level uses the low-level one
internally) to avoid confusion.

Also add documentation on our protocol handling.

* refactor: enumerate all variants in match statement to better catch errors in the future
2022-06-15 11:38:08 +00:00
Carol (Nichols || Goulding) e875a92cf8
feat: Log time spent requesting ingester partitions (#4806)
* feat: Log time spent requesting ingester partitions

Fixes #4558.

* feat: Record a metric for the duration queriers wait on ingesters

* fix: Use DurationHistogram instead of U64 Histogram

* test: Add a test for the ingester ms metric

* feat: Add back the logging to provide both logging and metrics for ingester duration

* refactor: Use sample_count method on metrics

* feat: Record ingester duration separately for success or failure

* fix: Create a separate test for the ingester metrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 17:58:19 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
Marco Neumann 2b84e5c087
feat: measure "probably reloaded" cache loads (#4813)
To roughly gauge how much data we re-load into cached (i.e. data that
was already loaded but was later evicted due to LRU pressure or TTL
eviction) this change introduces a new metric that estimates if a cache
entry that is requested from the loader was already seen before (using a
probabilistic filter).
2022-06-13 13:51:45 +00:00
Marco Neumann 66623fe0cd
feat: expose query semaphore metrics (#4836)
The groundwork for that was already done, just needed a bit of wiring.
This might help us to judge timeouts.
2022-06-13 09:36:50 +00:00
Andrew Lamb ddf61c5e98
refactor: Consolidate `Selection` creation, add tests (#4832)
* refactor: Consolidate Selection --> DataFusion projection

* fix: remove now unused function
2022-06-10 18:30:43 +00:00
kodiakhq[bot] dd8d44e24f
Merge branch 'main' into cn/duration 2022-06-10 14:23:09 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Carol (Nichols || Goulding) 1c7cbaf5ae
refactor: Use DurationHistogram in more places 2022-06-09 14:20:51 -04:00
Marco Neumann 4e5842dec7
feat: expose hit-miss metrics for querier caches (#4811)
* feat: `MetricsCache`

* feat: expose hit-miss metrics for querier caches

* refactor: `MetricsCache` -> `CacheWithMetrics`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 13:07:40 +00:00
Andrew Lamb 2ec7764fdd
refactor: rename builder like predicate methods to be `with_` (#4808)
* refactor: rename builder like predicate methods to be `with_`

* fix: merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 11:26:03 +00:00
Andrew Lamb 5e4fcfaa4d
refactor: reduce mut usage in Predicate (#4807)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 10:46:01 +00:00
Andrew Lamb afc1c12062
refactor: consolidate `PredicateBuilder` into `Predicate` (#4799)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-08 12:21:24 +00:00
Marco Neumann 317e9486df
refactor: make `Cache` a trait (#4802)
* refactor: make `Cache` a trait

To insert more high-level metrics (e.g. cache misses/hits) it would be
helpful if we could easily instrument the layer right above the cache
driver (that combines the backend and the loader). To do that without
polluting the types too much, let's introduce a trait that describes the
driver interface and that we could later wrap with intrumentation.

This also pulls out the test into a generic setup, similar to how this
is done for the cache storage backends.

This does NOT include any functionality changes.

* fix: typo

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-08 11:04:53 +00:00
Marco Neumann 4509e3db57 feat: wire up RB metrics for querier chunks 2022-06-07 15:31:49 +02:00
Andrew Lamb 8e96a2721d
chore: Update datafusion (again) (#4788)
* chore: Update datafusion

* chore: Update imports

* refactor: update API usage

* refactor: clean up some uses of binary_expr

* fix: remove unused export

* fix: update explain output

* chore: update more explain tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-07 08:17:56 +00:00
dependabot[bot] 04c685b3b7
chore(deps): Bump tokio-util from 0.7.2 to 0.7.3 (#4784)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.2...tokio-util-0.7.3)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-06 14:46:27 +00:00
dependabot[bot] e03bf94420
chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-06 14:15:12 +00:00
Carol (Nichols || Goulding) 5af0cc6acf
fix: Handle read buffer column not existing for column_names in QueryChunk impl 2022-06-03 12:45:16 -04:00
Carol (Nichols || Goulding) aa510ae4e6
fix: Remove test uses of parquet chunks and document as unused
The querier is now using read buffer chunks only, but we're leaving the
parquet chunk code around for the moment.
2022-06-03 09:16:04 -04:00
Carol (Nichols || Goulding) a4f51d99f6
feat: Use the read buffer chunk cache in the querier 2022-06-03 09:16:04 -04:00
Andrew Lamb 3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743)
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`

* chore: Update APIs

* chore: Run cargo hakari tasks

* feat: normalize parquet file metadata

* chore: update size tests

* chore: add docs on metadata stripping

* chore: TEMP UPDATE TO DF BRANCH

* chore: Update for new API

* fix: Update to latest DF

* fix: cargo hakari

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
dependabot[bot] 9a21292db8
chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-03 09:10:40 +00:00
Carol (Nichols || Goulding) 9d9c5d3692
fix: Take backoff config as an argument to be consistent with the other caches 2022-06-02 09:50:48 -04:00
Carol (Nichols || Goulding) 76b40ac6a1
refactor: Make the type alias into a struct 2022-06-02 09:26:11 -04:00
Carol (Nichols || Goulding) 715c65dfef
docs: Clarify a comment about what is an Arc 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 879dd7cec4
test: LRU behavior of the read buffer chunk cache 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 9328ba8c45
feat: Use new extra loading info to load read buffer chunks into cache 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 054c25de50
refactor: Add more methods to DecodedParquetFile
I'm tired of trying to remember which info is on which metadata.
2022-06-02 09:22:44 -04:00
Marco Neumann 9e30a3eb29
refactor: rework querier concurrency limiting (#4760)
* refactor: rework querier concurrency limiting

With #4752 we introduced a concurrency limit into the querier. It works
by drawing permits from a central semaphore whenever we create a
`QuerierNamespace`. This however only limits concurrency during query
planning and not query execution, because the objects contained within
the plan (chunks and some metadata) neither reference the permit nor the
`QuerierNamespace`.

Now one approach to fix that would be to wire up the permit all the down
into all the query-related data structures. This however is very fiddly
and potentially will get lost at some point, because as soon as we
transform these data structures -- e.g. into streams -- the permit might
get lost again. This will be potentially query-dependent and very hard
to debug.

So instead we reverse the approach and track the permits at the upper
layer of the stack: the gRPC service entry points. There we also need to
be careful -- e.g. when we return streams to tonic -- but it's way
easier to review that then the deeply nested object hierarchy that is
involved with queries. Also the separation of concerns is a bit clearer,
because why would a "chunk" care about the "query concurrency" as a
whole.

* refactor: improve gRPC permit keeping and prepare tests
2022-06-02 09:49:58 +00:00
Carol (Nichols || Goulding) 37347f2389
feat: Add an Extra type to Cacher Loader to specify extra information for loading entries 2022-06-01 08:58:19 -04:00
Andrew Lamb 2886149afc
chore: naming / comment cleanups from namespace semaphore (#4753) 2022-06-01 12:46:38 +00:00
Marco Neumann ebeccf037c
feat: limit querier concurrency by limiting number of active namespaces (#4752)
This is a rather quick fix for prod. On the mid-term we probably wanna
rethink our deployment strategy, e.g. by using "one query per pod" and
by deploying queryd w/ IOx into the same pod.
2022-06-01 11:59:35 +00:00