This is a rather quick fix for prod. On the mid-term we probably wanna
rethink our deployment strategy, e.g. by using "one query per pod" and
by deploying queryd w/ IOx into the same pod.
Changes the code paths that interact with Parquet files in the object
store to reference the ParquetStorage directly (DRY refactor).
This change takes us from a dependency graph of:
┌─────────────────┐
│ │
▼ │
Parquet Consumer │
│ ┌──────────────┐
├────────▶│ParquetStorage│
▼ └──────────────┘
┌──────────────┐
│ ObjectStore │
└──────────────┘
│
┌────┴────┐
▼ ▼
File s3
System (etc)
to:
Parquet Consumer
│
▼
┌──────────────┐
│ParquetStorage│
└──────────────┘
│
▼
┌──────────────┐
│ ObjectStore │
└──────────────┘
│
┌────┴────┐
▼ ▼
File s3
System (etc)
With the ParquetStorage being solely responsible for managing
interactions with the object store when dealing with Parquet files.
* refactor: require `Resource`s to be convertible to `u64`
* refactor: require `Resource`s to have a unit name
* refactor: make LRU cache IDs static
* feat: add LRU cache metrics
* docs: improve type names in LRU doctest
* docs: epxlain `MeasuredT`
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: explain `test_metrics`
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* ci: fix cargo deny
* chore: downgrade `socket2`, version 0.4.5 was yanked
* chore: rename `query` to `iox_query`
`query` is already taken on crates.io and yanked and I am getting tired
of working around that.
* feat: `SortKey::size`
* feat: `FunctionEstimator`
* feat: querier RAM pool
Let's put all the caches into a single RAM pool, so we can at least
somewhat control RAM usage. Note that this does NOT limit the peak
memory during query execution though, but should at least stop unlimited
cache growth. A follow-up PR will add metrics.
* refactor: improve some size calculations
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.
I left a few dependencies that this process flagged:
* generated_types
- `pbjson`,`serde`. Apparently used by the generated code.
* grpc-router-test-gen
- `prost`. Apparently used by the generated code.
* influxdb_iox
- `heappy`. Doesn't appear used, but is behind enough feature
flags that I don't care to reason about and it's already optional.
- `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
indirect dependency.
* iox_gitops_adapter
- `k8s_openapi`. Appears to be setting a feature flag of an indirect
dependency.
* chore: Tool for automating arrow version update
* chore: Update datafusion and arrow/parquet/arrow-flight
* fix: update for changes in Arrow API
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: impl `Debug` for `TestCatalog`
* feat: add sequencer ID and correct partition key to `IngesterPartition`
- simplifies debugging (parquet chunks and ingester chunks now use the
same partition key naming)
- the sequencer ID is required to correctly reason about tombstones (to
be implemented in a later PR)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Add basic Querier <--> Ingester "Service Configuration"
* docs: update comments in test
* refactor: cleanup tests a little
* refactor: make trait more consistent
* docs: improve comments in IngesterPartition